使用python混淆的xml比较

时间:2015-07-08 11:36:44

标签: python xml lxml

我想在python中给出两个xml格式下面的比较,并希望对我的方法有所了解

文件1:

<p1:car>                           
    <p1:feature car="111" type="color">511</p1:feature>
    <p1:feature car="223" type="color">542</p1:feature>
    <p1:feature car="299" type="color">559</p1:feature>
    <p1:feature car="323" type="color">564</p1:feature>
    <p1:feature car="353" type="color">564</p1:feature>
    <p1:feature car="391" type="color">570</p1:feature>
    <p1:feature car="448" type="color">570</p1:feature>

    <p1:feature car="111" type="tires" unit="percent">511</p1:feature>
    <p1:feature car="223" type="tires" unit="percent">513</p1:feature>
    <p1:feature car="299" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="323" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="353" type="tires" unit="percent">518</p1:feature>
    <p1:feature car="391" type="tires" unit="percent">520</p1:feature>
    <p1:feature car="448" type="tires" unit="percent">520</p1:feature>
</p1:car>

文件2:

<p1:car>                           
    <p1:feature car="111" type="color">511</p1:feature>
    <p1:feature car="223" type="color">542</p1:feature>
    <p1:feature car="299" type="color">559</p1:feature>
    <p1:feature car="323" type="color">564</p1:feature>
    <p1:feature car="353" type="color">564</p1:feature>
    <p1:feature car="391" type="color">570</p1:feature>
    <p1:feature car="448" type="color">570</p1:feature>

    <p1:feature car="223" type="tires" unit="percent">513</p1:feature>
    <p1:feature car="299" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="323" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="353" type="tires" unit="percent">518</p1:feature>
    <p1:feature car="391" type="tires" unit="percent">520</p1:feature>
    <p1:feature car="440" type="tires" unit="percent">520</p1:feature>
</p1:car>

您可以仔细查看文件2中的第2段中没有行<p1:feature car8="111" type="tires" unit="percent">511</p1:feature>,它存在于文件1中。

同样在文件2的第2段的最后一行car="440",而在文件1中,它是car="448"

我想要的是什么:

在我正在处理的文件中有很多这样的差异,所以你们可以告诉我如何从这些文件中打印出这些缺失的行和不相等的数字。我希望以下列形式输出:

In file two feature car="111", type="tires" and text = 511 is missing
In file two car="448" whereas in file one it is car="440"

另外,你可以向我推荐一些想法和不同的方法。很长一段时间我都陷入了这个问题,并希望立即解决这个问题。

我尝试了什么:

我正在使用lxml进行比较工作,我尝试以下列方式使用for循环:

for i,j in zip(file1.getchildren(),file2.getchildren()):
        if (int(i.get("car")) & int(i.text)) != (int(j.get("car")) & int(j.text)):
               print "difference of both files"

由于比较的逐行方法,我从两个文件的第2段开始得到所有错误的结果,因为第2个文件中缺少一行。

1 个答案:

答案 0 :(得分:2)

我认为你想要的是difflib。请使用官方文档here

一般来说,你想要的是:

from difflib import Differ
text_1 = file_1.read() # getting XML contents
text_2 = file_2.read() # getting XML contents from second file
d = Differ()
result = d.compare(text_1, text_2)

有关使用的更多详细信息,请参阅官方文档。