for line in open("data1.txt","r"):
for line2 in open("data2.txt","r"):
if line==line2:
print(line)
有什么办法/代码可以让这个快吗?该脚本自5天开始运行,但仍未完成。有没有办法在过程中显示%或当前行号?
答案 0 :(得分:4)
使用一个集合并反转逻辑,检查大数据文件中的任何行是否在f2的行集合中,这是一个较小的50mb文件:
with open("data1.txt", "r") as f1, open("data2.txt", "r") as f2:
lines = set(f1) # efficient 0(1) lookups using a set
for line in f2: # single pass over large file
if line in lines:
print(line)
如果您希望行号使用枚举:
with open("data1.txt", "r") as f1, open("data2.txt", "r") as f2:
lines = set(f1) # efficient 0(1) lookups using a set
for lined_no, line in enumerate(f2, 1): # single pass over large file
# print(line_no) # uncomment if you want to see every line number
if line in lines:
print(line,line_no)