Question

＆＃xA;程序正在尝试打印文件中最常见的10个单词。但是我无法打印10个最常用的单词

＆＃xA;＆＃xA;＆＃xA;＆＃xA;

  from string import *＆＃xA; file = open（ 'shakespeare.txt'）。read（）。lower（）。split（）＆＃xA;＆＃xA; number_of_words = 0＆＃xA;＆＃xA; onlyOneWord = []＆＃xA;＆＃xA; for i在文件中：＆＃xA;如果我只在OneWneWord：继续＆＃xA; else：onlyOneWord.append（i）＆＃xA; lot_of_words = {}＆＃xA;＆＃xA;＆＃xA; for onlyOneWord中的all_Words：＆＃xA; all_Words = all_Words.strip（标点符号）＆＃xA; number_of_words = 0＆＃xA;对于文件中的orignal_file：＆＃xA; orignal_file = orignal_file.strip（标点符号）＆＃xA;如果all_Words == orignal_file：＆＃xA; number_of_words + = 1＆＃xA; lot_of_words [all_Words] = number_of_words＆＃xA;＆＃xA;对于x，y in sorted（lot_of_words.items（））：＆＃xA; print（max（y））＆＃xA;

＆＃xA;＆＃xA;＆＃xA;＆＃xA;

现在它将打印完整文件中的内容< / p>＆＃xA;＆＃xA;＆＃xA;＆＃xA;

我需要它来打印这样的10个最常见的单词并让它运行得更快

＆＃xA;＆ #xA;

：251＆＃xA; apple：234＆＃xA;等。

＆＃xA;

Answer 1

您可以使用collections.Counter.most_common轻松完成此操作。我还使用str.translate删除标点符号。

from collections import Counter
from string import punctuation

strip_punc = str.maketrans('', '', punctuation)

with open('shakespeare.txt') as f:
    wordCount = Counter(f.read().lower().translate(strip_punc).split())

print(wordCount.most_common(10))

将打印元组列表

[('the', 251), ('apple', 100), ...]

编辑：我们可以通过使用我们用来删除标点符号的translate调用更改字母的大小来加快速度。

from string import punctuation, ascii_uppercase, ascii_lowercase

strip_punc = str.maketrans(ascii_lowercase, ascii_uppercase, punctuation)

打印10个最常见的单词

1 个答案: