我根据我正在检查的文件的性质编写了一个自定义文件比较脚本。本质上,文件被读入字典,然后脚本尝试对行进行散列以确定该行是否存在于另一个文件中。
f = open('test.txt', 'r')
f2 = open('test2.txt', 'r')
m1 = {}
m2 = {}
for line in f:
m1[line] = line
for line2 in f2:
m2[line2] = line2
for k in m1:
try:
l = m2[k]
except KeyError:
print m1[k]
f.close()
f2.close()
我在其中一个文件中放了一个垃圾线,但脚本没有打印出来。为什么没有检测到垃圾线?
答案 0 :(得分:0)
遵循@ PeterWood的建议,
def get_line_set(fname):
with open(fname) as inf:
return set(line.rstrip() for line in inf)
f1 = get_line_set("test.txt")
f2 = get_line_set("test2.txt")
only_in_f1 = f1 - f2
if only_in_f1:
print("\nThe following lines appear in f1 but not in f2:")
print("\n".join(sorted(only_in_f1)))
only_in_f2 = f2 - f1
if only_in_f2:
print("\nThe following lines appear in f2 but not in f1:")
print("\n".join(sorted(only_in_f2)))