Question

我有两个文本文件，我想用Python比较。这两个文件的标题都有Date。所以，我想在比较时忽略这一行，因为它总是会有所不同，不应该被视为差异。

File1中

Date : 04/29/2013
Some Text
More Text
....

文件2

Date : 04/28/2013
Some Text
More Text
....

我尝试使用filecmp模块对它们进行比较，但这不支持忽略任何模式的任何参数。是否有任何其他模块可用于此目的。我尝试使用difflib，但没有成功。此外，我只想知道是否存在差异b / w文件为True or False，difflib即使没有差异using whitespace也会打印所有行。

Answer 1

使用itertools.ifilter（或者只是Python 3中的普通filter）

itertools.ifilter(predicate, iterable)

您的谓词应该是一个函数，为您要忽略的行返回False。例如

def predicate(line):
    if 'something' in line:
        return False # ignore it
    return True

然后在您的文件对象上使用它。 fin = ifilter(predicate, fin)

然后使用

之类的东西

from itertools import izip, ifilter # on Py3 instead use normal zip and filter
f1 = ifilter(predicate, f1)
f2 = ifilter(predicate, f2)

all(x == y for x, y in izip(f1, f2))

你不需要difflib，除非你想知道差异是什么，并且因为你已经尝试filcmp我假设你只想知道是否存在差异。不幸的是，filecmp仅适用于文件名。

同样，对于跳过每个文件的第一行，只需使用itertools.islice(fin, 1, None)

from itertools import islice, izip

def predicate(line):
    ''' you can add other general checks in here '''
    if line.startswith('Date'):
        return False # ignore it
    return True

with open('File1.txt') as f1, open('File2.txt') as f2:
    f1 = ifilter(predicate, f1)
    f2 = ifilter(predicate, f2)
    print(all(x == y for x, y in izip(f1, f2)))

>>> True

Answer 2

如果你知道这个日期总是在第一行，你复制字符串列表中的行，你可以通过写行[1：]

删除第一行

评论后添加：

可能最好在其他解决方案中使用ifilter。如果文件不同，则必须遍历它们（使用两个索引，每个文件一个）并跳过包含其中一个关键字的行。

使用Python比较文件时忽略行

2 个答案: