Question

所以我有一堆文本文件，我必须阅读它们，计算每个单词并将其输出到不同的文件中，如下所示:(单词）（文件）（金额）

word1 file1 5
word1 file2 3
word2 file1 2
word2 file3 5

然后我需要对它们进行排序并合并所有保存文件和数量的缩进词，如下所示：

word1 file1:5 file2:3
word2 file1:2 file3:5

使用两个单词创建搜索功能，以查找仅包含两个搜索单词的文件名。

word1 in file1 counted 5
word2 in file1 counted 2

我做排序但仍需要进行合并和搜索：（

Answer 1

假设您已完成第一步（似乎就是这种情况），那么您可以执行以下步骤：

#this is what you start with
words = [ ('word1', 'file1', 5),
    ('word1', 'file2', 3),
    ('word2', 'file1', 2),
    ('word2', 'file3', 5) ]

#grouped by words
simple = {}
for word, f, count in words:
    try: simple [word] [f] = count
    except: simple [word] = {f: count}

print (simple)

#find files which contain both w1 and w2
def findTwoWords (data, w1, w2):
    files1 = set (data [w1].keys () )
    files2 = set (data [w2].keys () )
    return files1 & files2

print ('"word1" and "word2" appear together in {}'.format (findTwoWords (simple, 'word1', 'word2') ) )

如何合并和搜索重复值？

1 个答案: