Question

我正在尝试读取一个文本文件，然后打印出顶部最常用单词的所有单词，随着它在列表中的下降而减少。我有Python 3.3.2。

def wordCounter(thing):
# Open a file
    file = open(thing, "r+")
    newWords={}
    for words in file.read().split():
        if words not in newWords:
            newWords[words] = 1
        else:
            newWords[words] += 1

    for k,v in frequency.items():
        print (k, v)
    file.close()

现在，它确实打印出/ way /我想要的所有内容，但是有些单词比列表中较低的其他单词使用得多。我尝试过使用newWords.sort（），但它说：

"AttributeError: 'dict' object has no attribute 'sort'"

所以我不知所措，因为我的知识非常有限。

Answer 1

不要重新发明轮子collections.Counter将使用.most_common进行计数和排序，这将按顺序为您提供最不常见的单词：

from collections import Counter
def wordCounter(thing):
   with open(thing) as f:
       cn = Counter(w for line in f for w in line.split())
       return cn.most_common()

您也不需要将整个文件读入内存，您可以逐行迭代并拆分每一行。您还必须考虑标点符号，您可以使用str.strip删除它：

def wordCounter(thing):
    from string import punctuation
    with open(thing) as f:
        cn = Counter(w.strip(punctuation) for line in f for w in line.split())
        return cn.most_common()

Answer 2

首先打印最常用的单词：

from operator import itemgetter

for k, v in sorted(frequency.items(), key=itemgetter(1), reverse=True):
    print(k, v)

key是一个用于排序的函数。在我们的例子中，itemgetter检索值，即频率作为排序标准。

没有导入的替代方案：

for k, v in sorted(frequency.items(), key=lambda x: x[1], reverse=True):
    print(k, v)

Answer 3

您可以尝试这种方法：

from collections import Counter

with open('file_name.txt') as f:
    c=Counter(f.read().split())
    print c.most_common()

Answer 4

字典没有sort()方法。但是，您可以将字典传递给内置函数sorted()，它将生成字典键的list。使用带有函数的排序键，该函数返回字典键的值，即get()方法。

for key in sorted(newWords, key=newWords.get):
    print(key, newWords[key])

此外，您似乎一直在进行一些重构，因为代码中未定义frequency。

Answer 5

如果您想在不进行任何导入的情况下进行排序：

word_count = sorted(new_words.items(), key=lambda x: x[1], reverse=True)

注意：使用正则表达式打印出所有单词是一种更好的方法：

import re
from collections import defaultdict

word_count = defaultdict(int)
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*")
file = open("file.txt", 'r')
for line in file:
   for word in pattern.findall(line):
                word_count[word] += 1

python word counter w /排序频率

5 个答案: