获取两个文件的差异

时间:2016-03-13 12:09:23

标签: python-3.x

我有两个文件,我需要这两个文件中不同的行。 这两行文件中的行不一致。

我试图使用以下脚本

file1 = open("test1.txt","r")
file2 = open("test2.txt","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(file2):
    if lines2 != lines1[i]:
        print ("line ", i, " in File2 is different \n")
        print (lines2)
    else:
        print ("Its similar")

但是,这仅比较两个文件中相同行号的行。

我的档案示例:

File1:
User 1 is Sam and PC in VLAN Trust
User10 is Tom and PC in VLAN Sales
Harry is User 6 and in VLAN Fin
File2:
Harry is User 6 and in VLAN Fin
User 1 is Sam and PC in VLAN Trust
User10 is Tom and PC in VLAN Sales
User20 is Donald and VLAN is Trust

我希望输出告诉我File1中存在的缺失行。只要两个文件之间的任何行都是通用的,不管行号不同,就不应该将它列为差异。

4 个答案:

答案 0 :(得分:1)

with open('file1.txt','r') as f: lines1=f.readlines()
with open('file2.txt','r') as f: lines2=f.readlines()
diff=False
for line,idx in zip(lines2,range(len(lines2))):
    if line not in lines1:
        print("line %d of file2 is missing in file1:\n%s"%(idx,line))
        diff=True
if not diff:
    print("similar")

答案 1 :(得分:1)

打开文件,读取行。

然后遍历这些行并将file2中的每一行与file1中的行进行比较。如果该行同时存在,则变量inboth变为真。

我在最后添加了一个打印命令,所以我可以检查它是否有效。只需更改变量名称以适合您使用的变量名称,然后将其添加到当前程序中。希望这是一个帮助

f1 = open("file1.txt","r")
f2 = open("file2.txt","r")

lines1 = f1.readlines()
lines2 = f2.readlines()

for i in lines2:
    inboth = False
    for x in lines1:
        if i == x:
            inboth = True
    if inboth != True:
        print("The line: \n",i,"\nis in file 2 but not file 1\n")

答案 2 :(得分:1)

您可以尝试这样的事情:

file1 = open("test1.txt","r")
file2 = open("test2.txt","r")

lines1 = file1.readlines()
lines2 = file2.readlines()

for i, line in enumerate(lines2):
    if line not in lines1:
        print("Line {} in file 2 is not in file 1".format(i))

for i, line in enumerate(lines1):
    if line not in lines2:
        print("Line {} in file 1 is not in file 2".format(i))

file1.close()
file2.close()

这适用于这两个文件。行数从零开始。您可以通过在格式参数中编写i+1来修复它。还记得在脚本使用完毕后关闭文件。

答案 3 :(得分:1)

您最好的选择是使用difflib这是python中的内置模块。这是一个例子:

import difflib

file1_lines = [
    'User 1 is Sam and PC in VLAN Trust',
    'User10 is Tom and PC in VLAN Sales',
    'Harry is User 6 and in VLAN Fin'
]

file2_lines = [
    'Harry is User 6 and in VLAN Fin',
    'User 1 is Sam and PC in VLAN Trust',
    'User10 is Tom and PC in VLAN Sales',
    'User20 is Donald and VLAN is Trus'
]

differ = difflib.Differ()
diffs = list(differ.compare(file1_lines, file2_lines))

for diff in diffs:
    print(diff)

输出:

+ Harry is User 6 and in VLAN Fin
  User 1 is Sam and PC in VLAN Trust
  User10 is Tom and PC in VLAN Sales
- Harry is User 6 and in VLAN Fin
+ User20 is Donald and VLAN is Trus

docs for Differ开始,这些最初的双字母代码的含义是:

  
      
  • '- '对序列1唯一的行
  •   
  • '+ '对序列2唯一的行
  •   两个序列共有的
  • ' '
  •   输入序列中
  • '? '行不存在
  •   

这里"序列1"是differ.compare()和"序列2"的第一个参数。是第二个,它们都是要比较的字符串列表。

我更容易理解:

  • '+ '开头的行是file2_lines中添加的file1_lines
  • 中没有的行
  • '- '开头的行是file2_lines中不存在但file1_lines
  • 中的行
  • '? '开头的行是那些已更改的行(up to a certain threshold
  • ' '开头的行是未在两组行之间修改的行

修改

我看到在我的输出中,行Harry is user...未显示为未更改。如果我现在正确理解它,你希望它显示为不变。您可以通过首先排序字符串列表然后比较排序列表来解决这个问题。只需使用以下内容调用compare即可更改该行:

diffs = list(differ.compare(sorted(file1_lines), sorted(file2_lines)))