import nltk
import networkx as nx
import itertools
from itertools import product
import matplotlib.pyplot as plt
from pyvis import network as net
NETWORK OF MAGIC : SNA of the Harry Potter Series
The Harry Potters series, by British author JK Rowling, has been one of the most celebrated fantasy-fiction series of the modern world. Rowling created a dark fairytale, rooted in escapism through the magical world of wizards, witches, the Ministry of Magic and at the center of it all, Hogwarts - the school for witchcraft and wizardry. The series chronicles the lives of a young wizard, Harry Potter, and his friends Hermione Granger and Ron Weasley, who he meets while studying at Hogwarts. The series is built around Harry’s struggle against Lord Voldemort, a rogue wizard who is now back after Harry’s parents gave their life to destroy him. Needless to say, the books are full of magical battles, spells, charms, potions and creatures.
Personally, Harry Potter has been a very integral part of my life. I always found myself very invested in the lives of the characters as well as the different ways in which they used magic. For instance, Hermione Granger was established as a genius witch from the very beginning, with characters proclaiming that “there isn’t a spell that our Hermione cannot do” (Hagrid). Her usage of magic is often diverse and plentiful. Harry Potter on the other hand can be seen to favour particular spells more than other. I would argue that the crux of the series hinges on two spells - Expelliarmus (disarms the opponent) and Expecto Patronum (dispels creatures called Dementors). Furthermore, given the manner in which the series hinges on these main characters, I thought it would be interesting if I could computationally derive a “network of magic” around characters in the story and analyze whether (or how) this network informs my understanding of the text in turn.
Some Definitions and Workflow details:
- Packages used in the project:
- Natural Language Toolkit (NLTK): For tokenization of the text and cleanup.
- Networkx : For building the network graph of characters and spells cast
- Itertools : For building a dictionary with all possible spell-character combinations to parse the corpus for
- Matplotlib : Plotting and customization aid for networkx graphs
- Pyvis : A package that facilitates integration of interactive network graphs built from the Networkx module into the Jupyter Notebook itself.
- I am specifically defining my network as the “Network of Magic” in order to steer away from the interpretation that the network depicts which character used a particular spell. Rather, I want it to depict the closest character who was in the vicinity of the spell being cast. Most of the time, this does turns out to be the same individual who cast the spell. But it’s important to keep this distinction in mind in order to accurately interpret the Social Network Graph.
Load the relevant files
List of Spells: They were sourced from an ebook that I converted into a plain text file. The ebook was created from the spells listed on Harry Potter Wikia. This list also contained spells from Harry Potter spinoff series such as Fantastic Beasts and Where to Find Them, which had to be manually removed. Only names of spells were included. Charms (other than the Banishing Charm, which is a spell) and Hexes were not included in the list.
- Each spell was not completely unique. Eg: Protego, Protego Totalum and Protego Horribilus are all different spells.
- Some spells had common English words in them such as “Point” in Point Me or “Cave” in Cave Inicium
List of Characters: I built a list of characters from the Harry Potter Wikipedia
- It was a good mix of major characters and minor characters. All magical creatures such as Fawkes the phoenix, Dobby and Kreacher, the house elves, Buckbeak the Hippogriff, were not included in the list. The list only contained characters as defined as “magical human beings” with the exception of Lord Voldemort who toes the line between magical-human and magical-creature
Text Files of the Harry Potter series: The raw text files were obtained from this GitHub Repository.
- There were not suitable for this project immediately and had to be manually cleaned by correcting line breaks and most importantly, the names of spells and characters that had been spelled incorrectly. The files also had random special characters inserted between the text (a product of pdf -> text conversion, I presume) that had to be cleaned.
pwd
'/Users/anushasubramanian/Desktop/Harry Potter txt files'
#categorise the spells by their complete names, the one-word spells and the double-word spells
= open('Spells.txt').read().split("\n\n")
spellsFull = [spell.split()[0] for spell in spellsFull] # only first words
spells = [spell.split()[0] for spell in spellsFull if len(spell.split()) == 2] #all two word spells doubleSpell
10] spellsFull[:
['Accio',
'Banishing Charm',
'Aguamenti',
'Alohomora',
'Anapneo',
'Aparecium',
'Avada Kedavra',
'Avis',
'CaveInimicum',
'Colloportus']
= open('hp_chars.txt').read().split("\n") characters
# create a dictionary of all combos of spells and characters
= [spellsFull, characters]
combinedList = [p for p in itertools.product(*combinedList)]
combos = {combo: 0 for combo in combos} comboDict
#example of how the combodict keys looks
list(comboDict)[322:335]
[('Banishing Charm', 'Percy Weasley'),
('Banishing Charm', 'Ron Weasley'),
('Banishing Charm', 'Oliver Wood'),
('Banishing Charm', 'Rose Weasley'),
('Banishing Charm', 'Corban Yaxley'),
('Banishing Charm', 'Blaise Zabini'),
('Aguamenti', 'Hannah Abbott'),
('Aguamenti', 'Ludo Bagman'),
('Aguamenti', 'Bathilda Bagshot'),
('Aguamenti', 'Katie Bell'),
('Aguamenti', 'Cuthbert Binns'),
('Aguamenti', 'Phineas Nigellus Black'),
('Aguamenti', 'Sirius Black')]
= open("Book 1 - The Philosopher's Stone.txt").read()
hp1 = open("Book 2 - The Chamber of Secrets.txt").read()
hp2 = open("Book 3 - The Prisoner of Azkaban.txt").read()
hp3 = open("Book 4 - The Goblet of Fire.txt").read()
hp4 = open("Book 5 - The Order of the Phoenix.txt").read()
hp5 = open("Book 6 - The Half Blood Prince.txt").read()
hp6 = open("Book 7 - The Deathly Hallows.txt").read() hp7
After loading my files, it is important that I organize them efficiently into a corpus. The function getTokens
takes a text string and then returns a list of the tokens without the punctuations. I have then re-joined this list of tokens into a cleaned-up string of the original text file and organized them into a dictionary called bookDict
# Tokenize without punctuations
def getTokens(text):
= [word for word in nltk.word_tokenize(text) if word.isalpha()]
tokens return tokens
# dictionary containing cleaned text for each book
= {"Philosopher's Stone" : ' '.join(getTokens(hp1)),
bookDict "Chamber of Secrets" : ' '.join(getTokens(hp2)),
"Prisoner of Azkaban" : ' '.join(getTokens(hp3)),
"The Goblet of Fire" : ' '.join(getTokens(hp4)),
"The Order of the Phoenix" : ' '.join(getTokens(hp5)),
"The Half Blood Prince" : ' '.join(getTokens(hp6)),
"The Deathly Hallows" : ' '.join(getTokens(hp7))
}
In a series such as Harry Potter (or any fiction, really), a character is never just referred to by a single name. There are first names, last names, nicknames, family names, middle names etc. to be contended with. Specifically, in a fast paced series such as this one, there were two concerns to be dealt with:
Deceit and deception are a big part of the entire plot structure. Characters change appearances often and take on varied identities. Characters often go into hiding as part of the plot. Characters are also referenced by their childhood nicknames and magical slurs - many of which are not English words. Here are some examples:
- In the 3rd book, Scabbers, Ron Weasley’s pet rat, is actually revealed to be Peter Pettigrew - Lord Voldemort’s follower and the one that betrayed Harry Potter’s parents back in the day. In Pettigrew’s group of friends, he was also known as “Wormtail” since he had the ability to transform into a rat. Computationally, this means that
"Peter", "Pettigrew", "Scabbers", "Wormtail"
or any combinations of the above strings could be used to refer to him.
The second concern is related to the nicknames and alternate names that are widely prevalent in the books. For the most part, characters are referenced by their surname either when they are very close or when there is a tone of hostility to them. But this doesn’t always hold true because there are enough narrative instances, introductions or even general dialogue where the first names are used.
- His friends call him “Harry” and “Potter”, authority figures tend to refer to him as “Mr. Potter” and historically, in the magical world, he is known as “The Boy who Lived”, stemming from his survival of Lord Voldemort’s attack when he was still a baby. Similarly, his disciples call him “Lord Voldemort”, enemies call him “Voldemort” and in the magical world he is known as “He who must not be Named”. Further still, Voldemort is references by his birth name in the initial books that trace his origin stories - any combination of “Tom Marvolo Riddle Jr”.
Thus the normaliseNames
function written below takes in a token (string), looks through the the character list to see if the token exists in that or as any other form of that name, and outputs a single normalised result that matches the names that are a part of comboDict.keys()
This portion was the segment that intersected the most with a close and distance reading approaches to analyzing literature. From examples I have given about, it is easy to see that while some of these normalisations could be figured out with a quick google search, a lot of the more nuanced ones require knowledge of the Harry Potter Universe as a whole.
def normaliseNames(name):
"""
returns a standard output string for difference utterances of the same name of main characters
"""
if name.lower() in ["harry", "potter", "mr.potter", "boy who lived "]:
return "Harry Potter"
if name.lower() in ["lord voldemort", "voldemort", "he-who-must-not-be-named",
"tom", "tom riddle", "tom riddle jr.","tom marvolo riddle", "thomas", "master"]:
return "Lord Voldemort"
if name.lower() in ["hermione", "granger", "ms. granger", "miss granger"]:
return "Hermione Granger"
if name.lower() in ["peter", "pettigrew", "wormtail", "scabbers"]:
return "Peter Pettigrew"
if name.lower() in ["albus", "dumbledore", "headmaster"]:
return "Albus Dumbledore"
if name.lower() in ["peverell", "antioch", "cadmus", "ignotus"]:
return "Peverell Family"
if name.lower() in ["ron", "ronald", "ronald weasley", "ron weasley"]:
return "Ron Weasley"
if name.lower() in ["myrtle", "warren", "moaning myrtle"]:
return "Moaning Myrtle"
else:
# check if the name appears in any first or last names in the character list
= [char.split()[0] for char in characters]
first = [char.split()[1] for char in characters]
last if name in first or name in last:
for char in characters:
if name in char:
return char
return "Not a name"
# example of how this function operates
"Shacklebolt") normaliseNames(
'Kingsley Shacklebolt'
The Pivot
Like I had described in the introduction, the network I’m building seeks to find the closest named entity to a spell, such that that entity exists in the list of characters I have provided. The code here is used to achieve that.
As a summary, the code iterates through every item stored in bookDict
(the cleaned texts of the 7 Harry Potter books). For each book, the text is re-tokenized and then enumerated to keep track of the token indices. Tokens are scanned until a Spell Name is found. Once a spell name is found, 2 while
loops are run:
Forward distance loop: measures the distance between the spell and the first instance of a character name in the forward direction (d2)
Reverse distance loop: measures the distance between the closest character name that appears before the spell (reverse direction d1)
reverse character --- spell --- forward character ^d1 ^d2
The smaller of the distances d1 or d2 would inform our knowledge on which character is closest to the spell (whether it’s because they are uttering it, struck by it, or hiding from it). And using a binary scale, 1 is added to comboDict[key]
containing a tuple in the form of (< spell name > , < normalised character name with the shortest distance >)
for key in bookDict:
= getTokens(bookDict[key]) #re-tokenize text for every book
book = list(enumerate(book))
tokens for i,token in tokens:
= 1
forward = 1
reverse = 0
distancef = 0
distancer = False
foundf = False
foundr
if token in spells:
#extract the name of spell
if token in doubleSpell:
= token + ' ' + tokens[i+1][1]
spell = 2
forward else:
= token
spell
# run a forward loop
while foundf == False:
= normaliseNames(tokens[i+forward][1])
charNameF
if charNameF in characters:
= tuple([spell, charNameF])
keyf = True
foundf +=1
distancef+=1
forward
# run a reverse loop
while foundr == False:
= normaliseNames(tokens[i-reverse][1])
charNameR
if charNameR in characters:
= tuple([spell,charNameR])
keyr = True
foundr +=1
distancer+=1
reverse
if distancef < distancer:
+=1
comboDict[keyf]else:
+=1 comboDict[keyr]
Graphing the Network
Using the networkx module to create a graph with the characters as nodes, the edges as the interactions between characters and spells and the weights as the cumulative (binary) distances of how close they are to a spell. For instance, if Harry is closest to the spell Expecto Patronum multiple times, that edge will have a higher weight than another spell that Harry is only in the vicinity of once or twice.
= [key[1] for key in comboDict.keys() if comboDict[key]!=0] #character names as nodes charNodes
= nx.Graph() g
# add nodes
g.add_nodes_from(charNodes)
#add weighted edges
for key, value in comboDict.items():
if value > 0:
0],key[1],weight=value) g.add_edge(key[
# From pyvis
= net.Network(height=750, width="100%",bgcolor="#222222", font_color="white", notebook = True) nt
nt.from_nx(g)"Network of Magic in Harry Potter.html") nt.show(
Discussions
The results were generally more accurate than I expected. I will summarize some key points below:
Harry, Hermione and Ron, the three protagonists of the series, have the largest sub-networks of magic in the entire graph. This also intuitively makes sense from a close-reading perspective too - as protagonists they dominate a large chunk of the narrative across ALL 7 books in the series. At least one of them is bound to be present when some kind of event or orchestration of magic is taking place.
Harry is linked to both Expelliarmus and Expecto Patronum closely. These two spells play pivotal roles in the book. In fact, two very climactic instances of the book are hinged on these spells. Furthermore, he is also linked strongly to Avada Kedavra even though he doesn’t ever use it himself. Avada Kedavra is significant in his story in many ways. His parents were killed by that spell, the same spell gave him the lightnening shaped scar on his forehead that linked him magically to Voldemort. The spell was responsible for the death of his only surviving relative, Sirius Black, in front of his eyes, and Voldemort also uses it on Harry in the last book as he shows up to sacrifice himself.
The Sectumsempra spell network is a good example of how this network informs our understanding of the story. It connects to Harry, Draco, Snape and Ginny Weasley. All of them were involved with this spell intimately. Snape is the one who creates this spell, Harry uses it in a duel against Draco (who is then wounded) and Ginny witnesses the bloody affair. Althought Snape is never shown using it himself, Sectumsempra is a very integral part of his network of spells.
Thus, one of the main advantages of building the network this way lies in the fact that we can capture these unspoken relations by coding them as a metric of distance and weights.
Of course, this approach is not without its faults. Using distance to measure vicinity of magic means that occasionally, people who don’t practice magic also find themselves in its vicinity. Argus Filch’s network is such an example. Filch is what is called a “Squib” - a person born into a magical family but without magical powers. Thus, he cannot possibly use magic. However, his network does connect him to two spells used in his vicinity. This takes place when Harry, Hermione and Ron are hiding from him under the invisibility cloak, thus putting him in the vicinity of it.
Limitations
- Pronouns
One of the biggest limitations of this analysis was that is depended heavily on the names of characters being used in place of pronouns. However, novels tend to use a healthy mix of both in their sentences. Thu the “he”, “she”, “him”, “her”, “them” that were present closest to the spell, possibly even closer than the named instance of the characters, was skipped by the code and the person that the pronoun attributed to was not acknowledged as being closest to the spell.
To get over it, I tried utilizing spacy
as well as neuralcoref
, but that consumed too much memory and wouldn’t allow the corpus to entirely. Since my current method seemed to work relatively well, I decided to continue with it.
Conclusion and Future Directions
In terms of future direction, I think it would be interesting to try graphing networks of magic in individual Harry Potter books or even look at networks associated with magical creatures that I had mentioned in the introduction. I hypothesize that Nagini the snake or Fawkes the Phoenix would have some interesting networks, since they play important roles in the series. Furthermore, dialogue attribution with spacy would yield interesting results, despite the memory constraints. Another interesting sub-project would be to characterize these magical networks not just by spell and characters but also by the magical family they come from (Potter, Weasley, Dumbledore, Peverell etc.) and also their lineage such as whether they are purebloods (both parents are descended from wizards), half-bloods (one side of the family has wizard blood) or “mudbloods” (slang for a wizard or witch who has wizard blood but comes from families without such a history - Hermione was one.) It could yield interesting results about spell families, offensive v defensive usage etc.
This project was heavily inspired by Franco Moretti’s “Circle of Death” project using a social network derived from Shakespeare’s Hamlet. A distance-reading of my childhood favorites definitely provided me with a “zoomed-out” look on the Harry Potter Universe that I have always craved. Lord Voldemort was surprisingly less connected to magic than I would expect, Accio (summoning spell) was surprisingly extensive in its usage, especially since it’s not given particular importance. Legilimens (a kind of deep dive into the mind) was something I didn’t remember being used this often. These are just a few examples but I do believe that looking at such a network for long enough will yield interesting insights into character sketches of individuals and the manner in which they use magic - like the fact that “Death Eaters” (followers of Voldemort) seem to generally be in a vicinity of offensive magic.
One of the biggest things that my time in any research-based field has taught me is that visualizing a particular concept in different ways, forms and media can actually be very beneficial in understanding it from different perspectives. That is precisely what I tried to achieve through this social network analysis of the interplay between characters and magic in the harry potter series. And I do believe that it has informed the way in which I now look at this modern magical classic.
Works Cited
Cdn.Shopify.Com, 2020, https://cdn.shopify.com/s/files/1/0599/9645/files/A_Wizards_Guide_to_Spells_ePDF.pdf. Accessed 2 Aug 2020.
Litlab.Stanford.Edu, 2020, https://litlab.stanford.edu/LiteraryLabPamphlet2.pdf. Accessed 28 July 2020.
“Formcept/Whiteboard”. Github, 2020, https://github.com/formcept/whiteboard/tree/master/nbviewer/notebooks/data.
“Interactive Network Visualizations — Pyvis 0.1.3.1 Documentation”. Pyvis.Readthedocs.Io, 2020, https://pyvis.readthedocs.io/en/latest/#.
“List Of Harry Potter Characters”. En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/List_of_Harry_Potter_characters. Accessed 5 Aug 2020.
“List Of Spells”. Harry Potter Wiki, 2020, https://harrypotter.fandom.com/wiki/List_of_spells. Accessed 2 Aug 2020.
Reeves, Jonathan. “Notes/20-Social-Networks.Ipynb · Master · Berkeleydigitalhumanitiessummer / Computational Literary Analysis”. Gitlab, 2020, https://gitlab.com/digitalhumanitiesatberkeley/computational-literary-analysis/-/blob/master/Notes/20-social-networks.ipynb.