我是Python的新手,我正在尝试在数据集库中记录单词频率。 这就是我所拥有的,它告诉我它不能在第20行分配给文字。
import movie_scripts
import matplotlib.pyplot as plt
all_movies = movie_scripts.get_all_movies()
romeo = (all_movies[1]['lines']['all'])
tokens = WSTokenizer().tokenize(romeo)
male_words= set(['man','men',"man's", "men's", 'mr', 'mister', 'he', "he's", 'his', 'him', 'boy',"boys", 'guy', 'guys', 'brother', 'brothers', 'father', 'fathers', 'dad', 'dads', 'grandpa', 'grandpas', 'grandfather', 'boyfriend', 'boyfriends', 'uncle', 'uncles', 'mr', 'sir', 'sirs', 'son', 'sons', 'king', 'kings', 'prince', 'princes', 'daddy', 'daddies', "daddy's", 'chairman', 'chairmen', 'counrtyman', 'countrymmen', 'doorman', 'doormen', 'waiter', 'waiters', 'stud', 'studs', 'son of a bitch', 'sons of bitches', 'bro', 'bros', 'dude', 'dudes', "dude's", 'actor', 'actors', 'god', 'gods', "god's", 'husband', 'husbands', "husband's", 'himself', 'lord', 'lords', 'knight', 'knights', 'groom', 'grooms', "groom's"])
female_words = set(['woman', 'women', 'girl', 'girls', 'she', 'ms', 'her', "she's", "her's", 'lady', 'ladies', 'bitch', "bitch's", 'bitches', 'mom', 'mother', 'moms', 'mothers', "mom's", "mother's", 'grandmom', 'grandmas', 'grandmother', 'grandmothers', 'granddaughter', 'granddaughters', 'aunt', 'aunts', "ma'am", 'madame', 'daughter', 'daughters', 'sister', 'sisters', 'queen', 'queens', 'princess', 'princesses', 'mommy', 'mommies', "mommy's", 'waitress', 'waitresses', 'babe', 'babes', 'damsel', 'damsels', 'bird', 'birds', 'girlfriend', 'girlfriends', "girlfriend's", 'actress', 'actresses', 'goddess', 'goddesses', 'gal', 'gals', 'wife', 'wives', 'herself', 'dame', 'dames', 'bride', 'brides', "bride's"])
ended_with_male_words = 0
freq_dist = FreqDist()
for token in tokens:
if ended_in_male_words:
freq_dist.inc(len(token.type()))
ended_with_male_words = token.type()[-1].lower() in male_words
wordlens = freq_dis.samples()
wordlens.sort()
points = [(1, freq_dist.freq(1)) for 1 in wordlens]
Plot(points)
谢谢你的帮助
答案 0 :(得分:3)
更改行
points = [(1, freq_dist.freq(1)) for 1 in wordlens]
到
points = [(1, freq_dist.freq(1)) for tmp in wordlens]
除非1
的长度是l
- 这是单字母变量名称出现问题的一个很好的例子。
1
不是Python中变量的有效名称,并且解释器无法分配给wordlens
中的任何值。相反,解释器自动将1
视为文字数字类型 - 即。数字1。
答案 1 :(得分:2)
您无法使用该行:
points = [(1, freq_dist.freq(1)) for 1 in wordlens]
因为您有效地尝试将wordlens中的每个项目分配给数字1.数字不能用作变量,因此错误。您可以使用变量来解决它,例如:
points = [(wordlen, freq_dist.freq(wordlen)) for wordlen in wordlens]