Question

我试图遍历3个文本文件，并计算其中的每个唯一单词 - 并为每个文件创建这些唯一单词的库，然后将这些库（我已经制作成列表）添加到一起。例如，'file1.txt'，'file2.txt'和'file3.txt'都有自己的库。

def byFreq(pair):
    return pair[1]

def memes():

for filename in ['file1.txt','file2.txt','file3.txt']:
    txt = open(filename, 'r').read()
    txt = text.lower()
    for char in '!""#$%&()*+,-./:;<=>?@[\\]^_`{|}~\" ':
        txt = text.replace(char, ' ')
    words = txt.split()

    for w in words:
        counts[w] = counts.get(w,0) + 1

    itms = list(counts.itms())
    itms.sort()
    itms.sort(key = byFreq, reverse=True)
    for i in range(50):
        word, count = items[i]
        print ("{0:<15}{1:>5}".format(word, count))

这就是我现在所拥有的。它正确地遍历所有列表，但只返回每个第一次迭代的唯一字列表。

 the            14352
 of              6617
 and             6355
 a               4644
 to              4605

因为'file1.txt'在其中有'the'14352次，它返回该数字并丢弃其他文件中'the'的其他实例。它也用于其他单词，直到遇到之前未提及的其他文件中的另一个唯一单词。

我想让它做的是在所有文件中添加'the'的所有实例以及所有单词。我坚持上面这一点。任何帮助表示赞赏。

Answer 1

抱歉我的英语，我说得不好。

我用这种方式解决了这个练习：

import collections
counter = collections.Counter()

for filename in ['file1.txt', 'file2.txt', 'file3.txt']:
    txt = open(filename, 'r').read()
    txt = txt.lower()
    for char in '!""#$%&()*+,-./:;<=>?@[\\]^_`{|}~\" ':
        txt = txt.replace(char, ' ')
    words = txt.split()
    for word in words:
        counter[word] += 1

for word, count in counter.most_common(5): #print 5 most common words
    print("{0:<15}{1:>5}".format(word, count))

再见

Python：如何将多个.txt文件分别放入自己的库中。

1 个答案: