Question

我有一个文本文件，我试图获取最常用的单词。我使用的是Counter，但它似乎每个都会返回1。

我正在学习，所以我使用Simple Sabotage Field Manual作为我的文本文件。

import re
from collections import Counter
my_file = "fieldManual.txt"

#### GLOBAL VARIABLES
lst = [] # used in unique_words
cnt = Counter()

#########

def clean_word(the_word):
    #new_word = re.sub('[^a-zA-Z]', '',the_word)
    new_word = re.sub('^[^a-zA-z]*|[^a-zA-Z]*$', '', the_word)
    return new_word

def unique_words():
    with open(my_file, encoding="utf8") as infile:
        for line in infile:
            words = line.split()
            for word in words:
                edited_word = clean_word(word)
                if edited_word not in lst:
                    lst.append(edited_word)
                    cnt[edited_word] += 1
    lst.sort()  
    word_count = Counter(lst)
    return(lst)
    return (cnt)

unique_words()
test = ['apple','egg','apple','banana','egg','apple']
print(Counter(lst)) # returns '1' for everything
print(cnt) # same here

所以，print(Counter(test))正确地返回

专柜（{＆＃39; apple＆＃39;：3，＆＃39; egg＆＃39;：2，＆＃39; banana＆＃39;：1}）

但我尝试在lst中打印最常用的字词

专柜（{＆＃39;＆＃39;：1，＆＃39; A＆＃39;：1，＆＃39;实际＆＃39;：1，＆＃39;同意＆＃39;：1，＆＃39;协议＆＃39;：1，＆＃39; AK＆＃39;：1，＆＃39;和＆＃39;：1，＆＃39;任何＆＃39;：1，＆＃39;任何＆＃39;任何＆＃39; 39;：1，＆＃39; AR＆＃39;：1，＆＃39; AS-IS＆＃39;：1，＆＃39; ASCII＆＃39;：1，＆＃39;关于＆＃39;：1 ，＆＃39;摘要＆＃39;：1，＆＃39;意外地＆＃39;：1，＆＃39;行动＆＃39;：1，＆＃39;行为＆＃39;：1，＆＃39;添加＆＃39;：1，＆＃39;其他＆＃39;：1，＆＃39;调整＆＃39;：1，＆＃39;提倡＆＃39;：1，＆＃39;＆＃39;：＆＃39;：1，＆＃39;农业＆＃39;：1，......

在回答from here之后，我尝试在cnt.Update(edited_word)中添加if edited_word not in lst:，但随后打印cnt我只获得单个字符：

反击（{＆＃39; e＆＃39;：2401，＆＃39;我＆＃39;：1634，＆＃39; t＆＃39;：1470，＆＃39;＆＃39;：1467，＆＃39; n＆＃39;：1455，＆＃39;＆＃39;：1442，＆＃39; a＆＃39;：1407，＆＃39; o＆＃39;：1244，＆＃39; l＆＃ 39;：948，＆＃39;＆＃39;：862，＆＃39; d＆＃39;：752，＆＃39; u＆＃39;：651，＆＃39; p＆＃39;：590，＆＃39; g＆＃39;：564，＆＃39; m＆＃39;：436，...

如何从.txt文件中返回每个唯一单词的频率？

Answer 1

如果尚未找到该字词，则只会将该字词附加到列表中。因此，每个单词只会出现一次。

Answer 2

这里有一些问题。您应该递增计数器，无论该单词是否在列表中，或者只是从拆分字符串中调用列表上的计数器。你有背对背的返回语句（第二个不会被执行）。您正在使用word_count查找列表的计数，然后忽略该输出（对于每个单词也将为1）。只是清理这段代码可能有助于解决问题。

Counter（）为所有单词返回1。如何获得实际数量？

2 个答案: