我需要创建一个程序来读取文本文件并打印以下内容:
from collections import Counter
count = 0
file = open(r"sample_input.txt", "r", encoding="utf-8-sig")
wordcount = Counter(file.read().split())
for item in wordcount.items():
print("{}\t-\t{}".format(*item))
输出应如下所示:
WORD FREQUENCY
can - 1
grow - 1
and - 1
shrink - 1
on - 1
demand - 1
TOTAL = 6
我的程序对小写和大写的计数不同。有没有办法过滤出标点符号?
答案 0 :(得分:5)
创建单词列表时,请通过str.lower
将其转换为小写from collections import Counter
wordcount = Counter()
#Open the file
with open(r"sample_input.txt", "r", encoding="utf-8-sig") as file:
#Iterate through each line
for line in file:
#Strip any trailing or leading whitespaces
line = line.strip()
#Iterate over the words and keep updating counter
for word in line.split():
wordcount.update([word.lower()])
for key, value in wordcount.items():
print("{}\t-\t{}".format(key, value))
#Sum up the count of words
num_words = sum(wordcount.values())
print(num_words)
输出将为
can - 1
grow - 1
and - 1
shrink - 1
on - 1
demand - 1
6