我有两个文件:fileA和fileB。我想获取 fileA 中存在的 fileB 中所有行的行号。但是如果一行确实存在于fileA中,我将不会将其识别为“存在于fileA中”,除非下一行也在其中。所以我写了下面的代码:
def compare_two(fileA, fileB):
with open(fileA, 'r') as fa:
fa_content = fa.read()
with open(fileB, 'r') as fb:
keep_line_num = [] # the line number that's not in fileA
i = 1
while True:
line = fb.readline()
if line == '': # There are no blank lines in both files
break
last_pos = fb.tell()
theFollowing = line
new_line = fb.readline() # get the next line
theFollowing += new_line
fb.seek(last_pos)
if theFollowing not in fa_content:
keep_line_num.append(i)
i += 1
fb.close()
fa.close()
return keep_line_num
compare_two(fileA, fileB)
这适用于小文件。但我想将它用于大到2GB的大文件,这种方法对我来说太慢了。在Python2.7中还有其他方法可以解决这个问题吗?