Exploring the Yakuza 0 Script Text


I recently came across this full script of the main story of Yakuza 0 written by Reddit user Snow_Guard. Yakuza 0 is one of my favorite video games and is especially compelling in its story structure. The game alternates between two characters, Kiryu and Majima, who do not actually ever meet until the epilogue. However, there are several characters that interact with both Kiryu and Majima throughout the story.

One of the first things I wanted to do with the data was make a conversation network, such that each actor is a node in a network, connected to another actor if they were in the same scene or conversation. Looking at the original script, it was already formatted pretty easily to read in the html and look at the sections breaks, but I went ahead and coded the conversations manually for each line to account for sudden narrative jumps. I cut out some of the unnamed characters (e.g. "Yakuza Thug") that occur and focus on main ones. I also cut out characters with less than 5 lines.

Here is the resulting network!


The size of each character node corresponds to more lines in the main story, and the thickness between nodes corresponds to more conversations. As expected, there are some interesting characters that connect the two protagonists, like Makoto or Nishikiyama. We can take a look at the betweenness centrality metric for networks to see what characters are good at bridging all other characters in the network.

Name Betweenness Centrality
Majima 566.33
Kiryu 447.42
Oda 98.58
Nishikiyama 86.25
Makoto 25.08
Sagawa 12.17
Kuze 2.75
Lee 1.25
Dojima 0.67
Komeki 0.50

I actually expected Makoto to be higher in betweenness, but since Oda and Nishikiyama interact with many more other 3rd party characters, this is reflected in the betweenness metric.

I also took a look at what words and phrases are unique to each of the biggest characters. Looking at the 5 characters with the most lines, I computed the tf-idf statistic, which measures how relevant or unique each word is to a given character when considering all characters in the full script. Of course, this is a relatively small amount of text, so take it with a grain of salt.


Nothing too surprising here. For instance, Kiryu and Nishiki address each other a lot, and Makoto is more likely to mention her brother than many other characters. We can also look at bigrams in the dialogue, or pairs of words.


Some interesting quirks of each character emerge, like Oda being likely to address Kiryu as "Kiryu-kun." You can also see the empty lot mentioned uniquely by both Kiryu and Makoto, as this statistic takes into account all other characters in the story (not just these 5).

Thanks for reading and please let me know of any feedback!