我有一个单词列表:
words = ["hello","my","name"]
files = ["file1.txt","file2.txt"]
我想要的是计算所有文本文件中列表中每个单词的出现次数。
我目前的工作:
import re
occ = []
for file in files:
try:
fichier = open(file, encoding="utf-8")
except:
pass
data = fichier.read()
for wrd in words:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
occ.append(wrd + " : " + str(count))
texto = open("occurence.txt", "w+b")
for ww in occ:
texto.write(ww.encode("utf-8")+"\n".encode("utf-8"))
所以这段代码可以很好地处理单个文件,但是当我尝试一个文件列表时,它只给我最后一个文件的结果。
答案 0 :(得分:1)
使用 json
存储计数。
例如:
import json
# Read Json
with open('data_store.json') as jfile:
data = json.load(jfile)
for wrd in words:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
if wrd not in data:
data[wrd] = 0
data[wrd] += count # Increment Count
# Write Result to JSON
with open('data_store.json', "w") as jfile:
json.dump(data, jfile)
答案 1 :(得分:1)
使用字典代替列表:
import re
occ = {} # Create an empty dictionary
words = ["hello", "my", "name"]
files = ["f1.txt", "f2.txt", "f3.txt" ]
for file in files:
try:
fichier = open(file, encoding="utf-8")
except:
pass
else:
data = fichier.read()
for wrd in words:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
if wrd in occ:
occ[wrd] += count # If wrd is already in dictionary, increment occurrence count
else:
occ[wrd] = count # Else add wrd to dictionary with occurrence count
print(occ)
如果你想把它作为你问题中的字符串列表:
occ_list = [ f"{key} : {value}" for key, value in occ.items() ]