保存(打印)xml节点及其父节点但没有子节点

时间:2016-04-10 00:50:04

标签: python xml lxml elementtree

从XML文档中,我想将一个节点保存到文件中 - 包含所有父节点,但没有任何子节点。例如,对于以下XML:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.1">
 <Document id="myid">
  <name>ref.kml</name>
  <Style id="normalState">
     <IconStyle><scale>1.0</scale><Icon><href>yt.png</href></Icon></IconStyle>    
  </Style>
 </Document>
</kml>

<Document>节点的预期输出将如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.1">
 <Document id="myid">
 </Document>
</kml>

到目前为止,我只找到了一个解决方案,在保存之前迭代删除所有子元素。但是,由于我需要使用原始XML,我必须复制整个文档:

#!/usr/bin/env python

import lxml.etree as ET # have to use [lxml] because [xml] doesn't support 'xml_declaration'
import copy

kml_file = ET.parse("myfile.kml")
kml_copied = copy.deepcopy(kml_file) # .copy() is not enough, need .deepcopy()
root = kml_copied.getroot()
my_node = root[0]
for child in my_node:
    my_node.remove(child)
print ET.tostring(kml_copied, xml_declaration=True, encoding='utf-8')

有更好的方法吗?至少要避免对整个文件进行深度复制......

1 个答案:

答案 0 :(得分:0)

考虑XSLT,这是专门用于转换XML文档的声明性语言。 Python的lxml模块有一个内置的XSLT 1.0处理器。另外,XSLT(其脚本是格式良好的xml文档也可以充分处理kml未声明的命名空间):

XSLT 脚本(另存为.xsl将在Python中加载,也可以移植到其他语言中)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
               xmlns:doc="http://earth.google.com/kml/2.1">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform to copy entire document -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Empty Template to Remove Nodes -->
  <xsl:template match="doc:Style|doc:name"/>

</xsl:transform>

Python 脚本

import lxml.etree as ET

# LOAD XML AND XSL 
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')

# TRANSFORM INPUT INTO DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)

# OUTPUT DOM TO STRING
tree_out = ET.tostring(newdom,
                       encoding='UTF-8',
                       pretty_print=True,
                       xml_declaration=True)
print(tree_out.decode("utf-8"))

# SAVE RESULTING XML
xmlfile = open('Output.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()

<强>输出

<?xml version='1.0' encoding='UTF-8'?>
<kml xmlns="http://earth.google.com/kml/2.1">
  <Document id="myid"/>
</kml>