Question

wordcount = {}
    for vocab in file.read().split():
        if vocab not in wordcount:
            wordcount[vocab] = 1
        else:
            wordcount[vocab] = wordcount[vocab] + 1
    for (word,number) in wordcount.items():
        print (word, number)
print (word_count(0))

Answer 1

正如PM 2Ring所说，Counter对象在这里很有用，或者只是来自defaultdict库的collections。我们可以使用正则表达式包re来获得更强大的re.split()或简单地re.findall()：

from re import findall, IGNORECASE
from operator import itemgetter
from collections import defaultdict

wordcount = defaultdict(int)

file = open("license.txt")

for vocab in findall(r"[A-Z]+", file.read(), flags=IGNORECASE):
    wordcount[vocab.lower()] += 1

for word, number in sorted(wordcount.items(), key=itemgetter(1), reverse=True):
    print(word, number)

<强>输出

> python3 test.py
the 77
or 54
of 48
to 47
software 44
and 36
any 36
for 23
license 22
you 20
this 19
agreement 18
be 17
by 16
in 16
other 14
may 13
use 11
not 10
that 10
...

总是需要权衡：您可能需要微调模式以允许带连字符的单词或撇号，具体取决于您的应用程序。

如果输入文件相对较小，则读取整个文件并进行处理即可。如果没有，请在readline()的循环中逐行读取，并依次处理每一行。

如何从python中的字典中删除标点符号

1 个答案: