从文本文件计算python中的单词

时间:2019-04-15 14:54:49

标签: python

需要打开文本文件,并找到另一个文件中给定名称的出现次数。程序应写名称;计数对,用分号分隔成.csv格式的文件

它应该像这样:

简; 77

赫克托; 34

安娜; 39

...

试图使用“ Counter”,但它看起来像一个列表,所以我认为这是执行任务的错误方法

import re
import collections
from collections import Counter

wanted = re.findall('\w+', open('iliadcounts.csv').read().lower())
cnt = Counter()
words = re.findall('\w+', open('pg6130.txt').read().lower())
for word in words:
    if word in wanted:
        cnt[word] += 1
print (cnt)

但这绝对不是此任务的正确代码...

1 个答案:

答案 0 :(得分:1)

您可以一次将整个单词列表提供给Counter,它将为您计数。 然后,可以通过对其进行迭代来仅打印wanted中的单词:

import re
import collections
from collections import Counter

# create some demo data as I do not have your data at hand - uses your filenames
def create_demo_files():     
    with open('iliadcounts.csv',"w") as f:
        f.write("hug,crane,box")
    with open('pg6130.txt',"w") as f:
        f.write("hug,shoe,blues,crane,crane,box,box,box,wood")

create_demo_files()


# work with your files
with open('iliadcounts.csv') as f:
    wanted = re.findall('\w+', f.read().lower())
with open('pg6130.txt') as f:
    cnt = Counter( re.findall('\w+', f.read().lower()) )


# printed output for all words in wanted (all words are counted)
for word in wanted:
    print("{}; {}".format(word, cnt.get(word)))

    # would work as well: 
    # https://docs.python.org/3/library/string.html#string-formatting
    # print(f"{word}; {cnt.get(word)}") 

输出:

hug; 1
crane; 2
box; 3

或者您可以打印整个计数器:

 print(cnt) 

输出:

Counter({'box': 3, 'crane': 2, 'hug': 1, 'shoe': 1, 'blues': 1, 'wood': 1})

链接: