CPSC 203, 2025 W1
December 2, 2025
Data flow:

Thirty most common tokens in the mystery text.

Essentially: tagging proper nouns
NER used for inferring relationships between entities:

Harriet Smith’s intimacy at Hartfield was soon a settled thing. Quick and decided in her ways, Emma lost no time in inviting, encouraging, and telling her to come very often; and as their acquaintance increased, so did their satisfaction in each other. As a walking companion, Emma had very early foreseen how useful she might find her. In that respect Mrs. Weston’s loss had been important. Her father never went beyond the shrubbery, where two divisions of the ground sufficed him for his long walk, or his short, as the year varied; and since Mrs. Weston’s marriage her exercise had been too much confined. She had ventured once alone to Randalls, but it was not pleasant; and a Harriet Smith, therefore, one whom she could summon at any time to a walk, would be a valuable addition to her privileges. But in every respect, as she saw more of her, she approved her, and was confirmed in all her kind designs.
Typical categories of entities are PERSON, LOCATION, ORGANIZATION. Think about how you might discover each of the entities using a program.
ner_nb.py. Modify and execute this file to answer the following questions. In each case, sketch an example of the output, and explain it briefly in English.load a OFK excerpt from ofk_ch1Short.txt
create a spaCy doc object with doc = nlp(textRaw)
If doc is the result of part b, What does sents = list(doc.sents) do?
If sents is the result of part c, what does sentWords = [[token.text for token in sent] for sent in sents] do?
If sents is the result of part c, what does sentWordsPOS = [[(token.text, token.pos_) for token in sent] for sent in sents] do?
sents is the result of part c, what does sentWordsNER = [[(token.text, token.ent_iob_, token.ent_type_) for token in sent] for sent in sents] do?If sents is the result of part c, what does ents_by_sent = [[(ent.text, ent.label_) for ent in sent.ents] for sent in sents] do?
If doc is the result of part b, what does names = [ent.text for ent in doc.ents if ent.label_ == "PERSON"] do?
Suppose we’d like to understand the bonds between pairs of people in a book!

Given the text from a novel, how can we infer interaction or connections between characters? Discuss this question, and write down your ideas.
________________________________________
________________________________________
________________________________________
________________________________________
The social network graph will have…
Vertices:
Edges:
We could consider every pair of people and check every paragraph for their presence.
OR, we could
"I've heard of his family," said Ron darkly. "They were some of the first to come back to our side after You-Know-Who disappeared. Said they'd been bewitched. My dad doesn't believe it. He says Malfoy's father didn't need an excuse to go over to the Dark Side." He turned to Hermione. "Can we help you with something?"
Given [[RW, HG, HP], [RW, AD], [H, HP, HG]],
[RW, HG, HP, AD, H]?
{(RW,HG):1, (RW,HP):1, (HG,HP):2, (RW,AD):1, (H,HP):1, (H,HG):1}?In the PL activity, load socnet_nb.py.