Question

file1中有超过1000行，例如：

:)
still good
not
candy....wasn't even the good stuff.
how could i ever forget? #biggestdayoftheyear
not even think
will be

file2中有超过1000行，例如：

1,even,2
2,be,1
3,good,2
4,:),1
5,forget?,1
6,i,1
7,stuff.,1
8,#biggestdayoftheyear,1
9,think,1
10,will,1
11,how,1
12,not,2
13,the,1
14,still,1
15,ever,1
16,could,1
17,candy....wasn't,1

代码：

file1 = 'C:/Users/Desktop/file1.txt'
file2 = 'C:/Users/Desktop/file2.txt'

with open(file1) as f1:
    for line1 in f1:
        sline1 = str(line1.strip().split(' '))
        print sline1

with open(file2) as f2:
    for line2 in f2:
        sline2 = line2.split(',')
        #print sline2[0], sline2[1]
        if sline2[1] in sline1:
            print sline1.replace(sline1, sline2[0])

结果仅显示以下代码：

2
6
10

我错过了什么？有什么建议吗？

我想从第二列检查它们是否是相同的单词后，将file1中的所有单词替换为file2中第1列的数字。

预期结果：

4
14 3
12
17 1 13 3 7
1 16 6 15 5 8
12 1 9
10 2

Answer 1

您需要从file2构建inverted index。

inverted_index = {}
with open(file2) as f2:
   for line in f2:
       key, value, _ = line.split(',')
       inverted_index[value] = key

然后，在循环遍历file1时使用该反向索引进行检查：

with open(file1) as f1:
    for line in f1:
        print ' '.join([inverted_index.get(word, word) for word in line.strip().split(' ')])

Answer 2

我注意到你循环浏览文件1并明确设置sline1。退出循环后，循环访问文件2进行比较。因此，您将只处理sline1的最后一个值（因为您退出该循环）。一旦你构建Menno所示的字典倒排索引，你就可以设置替换过程。

比较两个文件并替换

2 个答案: