Question

我正在尝试使用python函数计算文本文件中单词的频率。我可以分别得到所有单词的频率，但是我试图通过将它们放在列表中来获取特定单词的计数。这是我迄今为止所拥有的，但目前我被卡住了。我的

def repeatedWords():
    with open(fname) as f:
        wordcount={}
        for word in word_list:
            for word in f.read().split():
                if word not in wordcount:
                    wordcount[word] = 1
                else:
                    wordcount[word] += 1
            for k,v in wordcount.items():
                 print k, v

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
repeatedWords('file.txt')

更新后，仍显示所有字词：

def repeatedWords(fname, word_list):
with open(fname) as f:
    wordcount = {}
    for word in word_list:
        for word in f.read().split():
            wordcount[word] = wordcount.get(word, 0) + 1


for k,v in wordcount.items():
    print k, v

word_list = ['艾玛'，'伍德豪斯'，'父亲'，'泰勒'，'小姐'，'已'，'她'，'她'] repeatedWords（'Emma.txt'，word_list）

Answer 1

所以你只需要该列表中特定单词的频率（Emma，Woodhouse，Father ......）？如果是这样，这段代码可能会有所帮助（尝试运行它）：

    word_list = ['Emma','Woodhouse','father','Taylor','Miss','been','she','her']
    #i'm using this example text in place of the file you are using
    text = 'This is an example text. It will contain words you are looking for, like Emma, Emma, Emma, Woodhouse, Woodhouse, Father, Father, Taylor,Miss,been,she,her,her,her. I made them repeat to show that the code works.'
    text = text.replace(',',' ') #these statements remove irrelevant punctuation
    text = text.replace('.','')
    text = text.lower() #this makes all the words lowercase, so that capitalization wont affect the frequency measurement

    for repeatedword in word_list:
        counter = 0 #counter starts at 0
        for word in text.split():
            if repeatedword.lower() == word:
                counter = counter + 1 #add 1 every time there is a match in the list
        print(repeatedword,':', counter) #prints the word from 'word_list' and its frequency

输出显示您提供的列表中只有那些字词的频率，这是您想要的吗？

在python3中运行时产生的输出是：

    Emma : 3
    Woodhouse : 2
    father : 2
    Taylor : 1
    Miss : 1
    been : 1
    she : 1
    her : 3

Answer 2

处理此问题的最佳方法是在Python字典中使用get方法。它可以是这样的：

def repeatedWords():
with open(fname) as f:
    wordcount = {}
    #Example list of words not needed
    nonwordlist = ['father', 'Miss', 'been']
    for word in word_list:
        for word in file.read().split():
            if not word in nonwordlist:
                wordcount[word] = wordcount.get(word, 0) + 1


# Put these outside the function repeatedWords
for k,v in wordcount.items():
    print k, v

print语句应该给你：

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
newDict = {}
for newWord in word_list:
    newDict[newWord] = newDict.get(newWord, 0) + 1

print newDict

此行wordcount[word] = wordcount.get(word, 0) + 1的作用是，它首先在词典word中查找wordcount，如果该词已经存在，则首先获取它的值并添加{ {1}}。如果1不存在，则该值默认为word，并且在此实例中添加0，使其成为该单词的第一次出现，其计数为1。

Python文本文件的字数

2 个答案: