合并两个xml文件并追加类似元素的元素和移动元素,这些元素不会出现在python中的一个文件中

时间:2017-01-20 10:00:15

标签: python xml python-3.x lxml xml.etree

我想合并两个XML文件。我阅读了很多解决方案,但它们特定于这些文件。我使用xml.etree.ElementTree以及lxml进行解析,比较文件,获取差异。我理解我的下一步是:

for element in file2.xml:
    if element present in file1.xml:
        append to output_file.xml
    else:
        copy element to the output_file

但是我还没有在XML上工作,并且合并的工具是许可的,所以我需要编写一个通用脚本来合并到我想要的格式。

file1.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

</great_grands>

file2.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>


    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

必需输出:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
    <great_grandma_name_two>great_grandma_name</great_grandma_name_two>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

1 个答案:

答案 0 :(得分:0)

考虑XSLT,这是一种特殊用途的声明性语言和XPath的兄弟,旨在转换XML文件。使用其document()函数,它可以从相对链接的外部XML文件进行解析。 Python的lxml模块可以处理XSLT 1.0脚本。

因为XSLT脚本是格式良好的XML文件,所以可以从文件或嵌入字符串中解析。下面假设所有文件和脚本都保存在同一目录中:

XSLT 脚本(另存为.xsl脚本,请注意仅引用file2.xml)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

 <xsl:template match="/great_grands">
   <xsl:copy>
     <xsl:copy-of select="great_grandpa_name_one"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/great_grandpa_name_two"/>
     <xsl:copy-of select="grandpa"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandpa"/>
     <xsl:copy-of select="grandma"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandma"/>
   </xsl:copy>
 </xsl:template>

</xsl:transform>

Python 脚本(请注意仅引用file1.xml)

from lxml import etree

xml = etree.parse('file1.xml')
xsl = etree.parse('XSLTScript.xsl')

transform = etree.XSLT(xsl)
newdom = transform(xml)

# SAVE NEW DOM STRING TO FILE
with open('Output.xml', 'wb') as f:
   f.write(newdom)

<强>输出

<?xml version="1.0" encoding="UTF-8"?>
<great_grands>
  <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
  <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
  <grandpa>
    <grandpa_name>grandpa_name_one_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name>grandpa_name_two_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
  </grandpa>
  <grandma>
    <grandma_name>grandma_name_one_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name>grandma_name_two_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name_2>grandma_name_one_2</grandma_name_2>
  </grandma>
</great_grands>