如何在python中比较2个文件时忽略字段数据

时间:2016-06-28 16:04:57

标签: python

输入文件如下所示,字段架构asMode|Date|Count|timestamp|status|insertTimeStamp

test1.txt:
HR|06/08/2016|3000|Thu Jun 09 2016|Complete|20160627020300
HR|06/08/2016|2000|Thu Jun 09 2016|Complete|20160627020400
HR|06/08/2016|1000|Thu Jun 09 2016|Complete|20160627020500
test2.txt:
HR|06/08/2016|3010|Thu Jun 09 2016|Complete|20160627070300
HR|06/08/2016|2000|Fri Jun 09 2016|Complete|20160627080300
HR|06/08/2016|1500|Thu Jun 09 2016|Complete|20160627090300

现在我的要求是比较两个文件之间的差异线,但在比较时应该忽略insertTimeStamp字段(最后一列数据)。

我试过下面的代码。它的工作正常,但它逐行比较。有人可以建议我在比较时我的代码如何跳过insertTimeStamp字段?

先谢谢你的帮助。

import difflib
import sys

with open('/tmp/test1.txt', 'r') as hosts0:
    with open('/tmp/test2.txt', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
            n=0,
        )
        for line in diff:
            for prefix in ('---', '+++', '@@'):
                if line.startswith(prefix):
                    break
            else:
                sys.stdout.write(line[1:])

1 个答案:

答案 0 :(得分:1)

在将它们传递给diff函数

之前,你可能只是切掉每一行中的最后一个元素
diff = difflib.unified_diff(
    ['|'.join(x.split('|')[:-1]) for x in hosts0.readlines()],
    ['|'.join(x.split('|')[:-1]) for x in hosts1.readlines()],
    fromfile='hosts0',
    tofile='hosts1',
    n=0,
)

使用difflib进行逐行比较:

with open('/tmp/test1.txt', 'r') as fh:
    hosts1 = fh.readlines()
with open('/tmp/test2.txt', 'r') as fh:
    hosts2 = fh.readlines()  

for h1, h2 in zip(hosts1, hosts2):
    if h1.split('|')[:-1] != h2.split('|')[:-1]:
        print 'Lines are not the same!'