所以我有一堆文本文件,我必须阅读它们,计算每个单词并将其输出到不同的文件中,如下所示:(单词)(文件)(金额)
word1 file1 5
word1 file2 3
word2 file1 2
word2 file3 5
然后我需要对它们进行排序并合并所有保存文件和数量的缩进词,如下所示:
word1 file1:5 file2:3
word2 file1:2 file3:5
使用两个单词创建搜索功能,以查找仅包含两个搜索单词的文件名。
word1 in file1 counted 5
word2 in file1 counted 2
我做排序但仍需要进行合并和搜索:(
答案 0 :(得分:0)
假设您已完成第一步(似乎就是这种情况),那么您可以执行以下步骤:
#this is what you start with
words = [ ('word1', 'file1', 5),
('word1', 'file2', 3),
('word2', 'file1', 2),
('word2', 'file3', 5) ]
#grouped by words
simple = {}
for word, f, count in words:
try: simple [word] [f] = count
except: simple [word] = {f: count}
print (simple)
#find files which contain both w1 and w2
def findTwoWords (data, w1, w2):
files1 = set (data [w1].keys () )
files2 = set (data [w2].keys () )
return files1 & files2
print ('"word1" and "word2" appear together in {}'.format (findTwoWords (simple, 'word1', 'word2') ) )