输入文件如下所示,字段架构asMode|Date|Count|timestamp|status|insertTimeStamp
test1.txt:
HR|06/08/2016|3000|Thu Jun 09 2016|Complete|20160627020300
HR|06/08/2016|2000|Thu Jun 09 2016|Complete|20160627020400
HR|06/08/2016|1000|Thu Jun 09 2016|Complete|20160627020500
test2.txt:
HR|06/08/2016|3010|Thu Jun 09 2016|Complete|20160627070300
HR|06/08/2016|2000|Fri Jun 09 2016|Complete|20160627080300
HR|06/08/2016|1500|Thu Jun 09 2016|Complete|20160627090300
现在我的要求是比较两个文件之间的差异线,但在比较时应该忽略insertTimeStamp
字段(最后一列数据)。
我试过下面的代码。它的工作正常,但它逐行比较。有人可以建议我在比较时我的代码如何跳过insertTimeStamp
字段?
先谢谢你的帮助。
import difflib
import sys
with open('/tmp/test1.txt', 'r') as hosts0:
with open('/tmp/test2.txt', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
n=0,
)
for line in diff:
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
sys.stdout.write(line[1:])
答案 0 :(得分:1)
在将它们传递给diff函数
之前,你可能只是切掉每一行中的最后一个元素diff = difflib.unified_diff(
['|'.join(x.split('|')[:-1]) for x in hosts0.readlines()],
['|'.join(x.split('|')[:-1]) for x in hosts1.readlines()],
fromfile='hosts0',
tofile='hosts1',
n=0,
)
使用difflib进行逐行比较:
with open('/tmp/test1.txt', 'r') as fh:
hosts1 = fh.readlines()
with open('/tmp/test2.txt', 'r') as fh:
hosts2 = fh.readlines()
for h1, h2 in zip(hosts1, hosts2):
if h1.split('|')[:-1] != h2.split('|')[:-1]:
print 'Lines are not the same!'