Question

我需要创建一个程序来读取文本文件并打印以下内容：

文本中所有唯一的单词
它们出现在文本中的次数
总字数

from collections import Counter
count = 0

file = open(r"sample_input.txt", "r", encoding="utf-8-sig")
wordcount = Counter(file.read().split())

for item in wordcount.items():
    print("{}\t-\t{}".format(*item))

输出应如下所示：

WORD FREQUENCY 
can - 1
grow - 1
and - 1
shrink - 1
on - 1
demand - 1 
TOTAL = 6

我的程序对小写和大写的计数不同。有没有办法过滤出标点符号？

Answer 1

创建单词列表时，请通过str.lower

将其转换为小写

from collections import Counter

wordcount = Counter()

#Open the file
with open(r"sample_input.txt", "r", encoding="utf-8-sig") as file:

    #Iterate through each line
    for line in file:

        #Strip any trailing or leading whitespaces
        line = line.strip()
        #Iterate over the words and keep updating counter
        for word in line.split():
            wordcount.update([word.lower()])

for key, value in wordcount.items():
    print("{}\t-\t{}".format(key, value))

#Sum up the count of words
num_words = sum(wordcount.values())
print(num_words)

输出将为

can - 1
grow - 1
and - 1
shrink - 1
on - 1
demand - 1
6

如何使用Python制作单词计数器程序？

1 个答案: