我正在尝试使用python函数计算文本文件中单词的频率。我可以分别得到所有单词的频率,但是我试图通过将它们放在列表中来获取特定单词的计数。这是我迄今为止所拥有的,但目前我被卡住了。我的
def repeatedWords():
with open(fname) as f:
wordcount={}
for word in word_list:
for word in f.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k,v in wordcount.items():
print k, v
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
repeatedWords('file.txt')
更新后,仍显示所有字词:
def repeatedWords(fname, word_list):
with open(fname) as f:
wordcount = {}
for word in word_list:
for word in f.read().split():
wordcount[word] = wordcount.get(word, 0) + 1
for k,v in wordcount.items():
print k, v
word_list = ['艾玛','伍德豪斯','父亲','泰勒','小姐','已','她','她'] repeatedWords('Emma.txt',word_list)
答案 0 :(得分:1)
所以你只需要该列表中特定单词的频率(Emma,Woodhouse,Father ......)?如果是这样,这段代码可能会有所帮助(尝试运行它):
word_list = ['Emma','Woodhouse','father','Taylor','Miss','been','she','her']
#i'm using this example text in place of the file you are using
text = 'This is an example text. It will contain words you are looking for, like Emma, Emma, Emma, Woodhouse, Woodhouse, Father, Father, Taylor,Miss,been,she,her,her,her. I made them repeat to show that the code works.'
text = text.replace(',',' ') #these statements remove irrelevant punctuation
text = text.replace('.','')
text = text.lower() #this makes all the words lowercase, so that capitalization wont affect the frequency measurement
for repeatedword in word_list:
counter = 0 #counter starts at 0
for word in text.split():
if repeatedword.lower() == word:
counter = counter + 1 #add 1 every time there is a match in the list
print(repeatedword,':', counter) #prints the word from 'word_list' and its frequency
输出显示您提供的列表中只有那些字词的频率,这是您想要的吗?
在python3中运行时产生的输出是:
Emma : 3
Woodhouse : 2
father : 2
Taylor : 1
Miss : 1
been : 1
she : 1
her : 3
答案 1 :(得分:0)
处理此问题的最佳方法是在Python字典中使用get
方法。它可以是这样的:
def repeatedWords():
with open(fname) as f:
wordcount = {}
#Example list of words not needed
nonwordlist = ['father', 'Miss', 'been']
for word in word_list:
for word in file.read().split():
if not word in nonwordlist:
wordcount[word] = wordcount.get(word, 0) + 1
# Put these outside the function repeatedWords
for k,v in wordcount.items():
print k, v
print语句应该给你:
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
newDict = {}
for newWord in word_list:
newDict[newWord] = newDict.get(newWord, 0) + 1
print newDict
此行wordcount[word] = wordcount.get(word, 0) + 1
的作用是,它首先在词典word
中查找wordcount
,如果该词已经存在,则首先获取它的值并添加{ {1}}。如果1
不存在,则该值默认为word
,并且在此实例中添加0
,使其成为该单词的第一次出现,其计数为1
。