file1中有超过1000行,例如:
:)
still good
not
candy....wasn't even the good stuff.
how could i ever forget? #biggestdayoftheyear
not even think
will be
file2中有超过1000行,例如:
1,even,2
2,be,1
3,good,2
4,:),1
5,forget?,1
6,i,1
7,stuff.,1
8,#biggestdayoftheyear,1
9,think,1
10,will,1
11,how,1
12,not,2
13,the,1
14,still,1
15,ever,1
16,could,1
17,candy....wasn't,1
代码:
file1 = 'C:/Users/Desktop/file1.txt'
file2 = 'C:/Users/Desktop/file2.txt'
with open(file1) as f1:
for line1 in f1:
sline1 = str(line1.strip().split(' '))
print sline1
with open(file2) as f2:
for line2 in f2:
sline2 = line2.split(',')
#print sline2[0], sline2[1]
if sline2[1] in sline1:
print sline1.replace(sline1, sline2[0])
结果仅显示以下代码:
2
6
10
我错过了什么?有什么建议吗?
我想从第二列检查它们是否是相同的单词后,将file1中的所有单词替换为file2中第1列的数字。
预期结果:
4
14 3
12
17 1 13 3 7
1 16 6 15 5 8
12 1 9
10 2
答案 0 :(得分:1)
您需要从file2构建inverted index。
inverted_index = {}
with open(file2) as f2:
for line in f2:
key, value, _ = line.split(',')
inverted_index[value] = key
然后,在循环遍历file1时使用该反向索引进行检查:
with open(file1) as f1:
for line in f1:
print ' '.join([inverted_index.get(word, word) for word in line.strip().split(' ')])
答案 1 :(得分:0)
我注意到你循环浏览文件1并明确设置sline1。退出循环后,循环访问文件2进行比较。因此,您将只处理sline1的最后一个值(因为您退出该循环)。一旦你构建Menno所示的字典倒排索引,你就可以设置替换过程。