我下面有一个文本文件,其结构如下:word
count
product 5
order 4
tracking 1
这意味着在输入文档中product
次发现了5
一词。
我有一个名为WordFrequency.py
的脚本,该脚本用于查找单词以及它们在输入文件中的次数:
import re
from collections import Counter
def count_words(file_path):
with open("/Users/oliverbusk/Sites/Sandbox/storage/app/" + file_path, 'r', encoding="utf-8") as f:
matches = re.findall(r'\b[a-zA-Z]{3,}\b', f.read())
wordcount = Counter(matches)
for word in wordcount:
string = word + " " + str(wordcount[word])
write_to_file(string)
def write_to_file(word):
with open("/Dictionaries/eng.txt", "a+") as f:
f.write(word + "\n")
因此,基本上,上面的代码将读取输入文件file_path
,并将单词和计数添加到eng.txt
。
但是,每当我运行它时,结果都将被附加到eng.txt
文件中,例如:
product 5
order 4
tracking 1
product 5
order 4
tracking 1
相反,如果count
文件中已经存在该单词,我希望它增加eng.txt
。
答案 0 :(得分:1)
一种方法是先读取文件的内容,然后增加计数。
例如:
import re
from collections import Counter, defaultdict
def count_words():
#Read Content#
with open("/Dictionaries/eng.txt", "r") as f:
data = defaultdict(int)
for line in f:
key, value = line.strip().split()
data[key] = int(value)
with open("/Users/oliverbusk/Sites/Sandbox/storage/app/" + file_path, 'r', encoding="utf-8") as f:
matches = re.findall(r'\b[a-zA-Z]{3,}\b', f.read())
wordcount = Counter(matches)
for word, count in wordcount.items():
data[word] += count #Increment Count
#Write To File
write_to_file(data)
def write_to_file(data):
with open("/Dictionaries/eng.txt", "w") as f:
for word, count in data.items():
string = word + " " + str(count)
f.write(string + "\n")