Question

我下面有一个文本文件，其结构如下：word count

product 5
order 4
tracking 1

这意味着在输入文档中product次发现了5一词。

我有一个名为WordFrequency.py的脚本，该脚本用于查找单词以及它们在输入文件中的次数：

import re
from collections import Counter

def count_words(file_path):
    with open("/Users/oliverbusk/Sites/Sandbox/storage/app/" + file_path, 'r', encoding="utf-8") as f:

        matches = re.findall(r'\b[a-zA-Z]{3,}\b', f.read())

        wordcount = Counter(matches)

        for word in wordcount:
            string = word + " " + str(wordcount[word])
            write_to_file(string)

def write_to_file(word):
    with open("/Dictionaries/eng.txt", "a+") as f:
        f.write(word + "\n")

因此，基本上，上面的代码将读取输入文件file_path，并将单词和计数添加到eng.txt。

但是，每当我运行它时，结果都将被附加到eng.txt文件中，例如：

product 5
order 4
tracking 1
product 5
order 4
tracking 1

相反，如果count文件中已经存在该单词，我希望它增加eng.txt。

Answer 1

一种方法是先读取文件的内容，然后增加计数。

例如：

import re
from collections import Counter, defaultdict

def count_words():
    #Read Content#
    with open("/Dictionaries/eng.txt", "r") as f:
        data = defaultdict(int)
        for line in f:
            key, value = line.strip().split()
            data[key] = int(value)

    with open("/Users/oliverbusk/Sites/Sandbox/storage/app/" + file_path, 'r', encoding="utf-8") as f:
        matches = re.findall(r'\b[a-zA-Z]{3,}\b', f.read())
        wordcount = Counter(matches)
        for word, count in wordcount.items():
            data[word] += count                 #Increment Count

    #Write To File
    write_to_file(data)

def write_to_file(data):
    with open("/Dictionaries/eng.txt", "w") as f:
        for word, count in data.items():
            string = word + " " + str(count)
            f.write(string + "\n")

Python3-文本文件中的增量数

1 个答案: