如何合并和搜索重复值?

时间:2014-02-16 04:20:58

标签: python

所以我有一堆文本文件,我必须阅读它们,计算每个单词并将其输出到不同的文件中,如下所示:(单词)(文件)(金额)

word1 file1 5
word1 file2 3
word2 file1 2
word2 file3 5

然后我需要对它们进行排序并合并所有保存文件和数量的缩进词,如下所示:

word1 file1:5 file2:3
word2 file1:2 file3:5

使用两个单词创建搜索功能,以查找仅包含两个搜索单词的文件名。

word1 in file1 counted 5
word2 in file1 counted 2

我做排序但仍需要进行合并和搜索:(

1 个答案:

答案 0 :(得分:0)

假设您已完成第一步(似乎就是这种情况),那么您可以执行以下步骤:

#this is what you start with
words = [ ('word1', 'file1', 5),
    ('word1', 'file2', 3),
    ('word2', 'file1', 2),
    ('word2', 'file3', 5) ]

#grouped by words
simple = {}
for word, f, count in words:
    try: simple [word] [f] = count
    except: simple [word] = {f: count}

print (simple)

#find files which contain both w1 and w2
def findTwoWords (data, w1, w2):
    files1 = set (data [w1].keys () )
    files2 = set (data [w2].keys () )
    return files1 & files2

print ('"word1" and "word2" appear together in {}'.format (findTwoWords (simple, 'word1', 'word2') ) )