在python中解析XML并删除容器

时间:2017-09-20 17:15:58

标签: python xml python-3.x xml-parsing autosar

我正在尝试编写一个Python脚本,该脚本将遍历该文件并删除特定节点属性的容器。例如,我的树看起来像:

<collection shelf="New Arrivals">
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
</collection>

Q1

如果子节点<DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">的属性等于:/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport

,则应删除整个容器

我写的脚本是:

import xml.etree.ElementTree as ET
tree = ET.parse('autosar1.xml')
root = tree.getroot()
for child in root.findall(".//ECUC-NUMERICAL-PARAM-VALUE"):
    for z in child.findall(".//DEFINITION-REF[@DEST='ECUC-BOOLEAN-PARAM-DEF']"):
        if z.text == "/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/ComIPduCancellationSupport":
            child.remove(z)         
tree.write('output.xml')

但我没有得到预期的结果。 我得到的结果是:

<collection shelf="New Arrivals">
<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>

<ECUC-NUMERICAL-PARAM-VALUE>
<SHORT-NAME>RTE_ABC</SHORT-NAME>
</ECUC-NUMERICAL-PARAM-VALUE>
</collection>

我想得到的结果:

<collection shelf="New Arrivals">
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
  <ECUC-NUMERICAL-PARAM-VALUE>
    <SHORT-NAME>RTE_ABC</SHORT-NAME>
    <DEFINITION-REF DEST="ECUC-BOOLEAN-PARAM-DEF">/AUTOSAR/EcucDefs/Com/ComConfig/ComIPdu/xyz</DEFINITION-REF>
  </ECUC-NUMERICAL-PARAM-VALUE>
</collection>

Q2

有可能通过获取用户输入(在命令提示符中),假设为if,(而不是整个属性为{{1> }},实现了所需的输出。

非常感谢。

1 个答案:

答案 0 :(得分:1)

考虑使用第三方lxml功能最丰富且易于使用的库来处理Python语言中的XML和HTML 。您可以使用pip或binary file安装Windows。推荐的原因是模块可以运行完整的符合W3C标准的XPath 1.0和XSLT 1.0,后者的XSLT对你很有用。

XSLT是一种特殊用途的语言,可以转换XML文件,例如有条件地删除节点。特别是在XSLT中,我们运行Identity Transform(按原样复制整个文档),然后在我们要删除的节点上运行一个空模板。请注意使用contains()检查该节点文本中任何位置的字符串。此方法不需要for循环或if逻辑。

使用Python的lxml,我们可以从字符串构建一个动态XSLT脚本(顺便说一下 一个XML文件)并传递一个字符串,例如 COMPU-METHOD- REF contains()。这样的字符串可以来自用户输入。请注意字符{0}的{​​{1}}占位符。

<强>的Python

.format()

<强>输出

import lxml.etree as et
doc = et.parse('Input.xml')

xsl_str='''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                                         xmlns:doc="http://autosar.org/3.0.2">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- EMPTY TEMPLATE -->
  <xsl:template match="INTEGER-TYPE[descendant::COMPU-METHOD-REF/@DEST='COMPU-METHOD' and 
                                    contains(descendant::COMPU-METHOD-REF, '{0}')]">    
  </xsl:template>

</xsl:stylesheet>'''

# LOAD DYNAMIC XSL STRING (PASSING BELOW STRING INTO ABOVE)
xsl = et.fromstring(xsl_str.format('CoolantTemp_T'))

transform = et.XSLT(xsl)
result = transform(doc)

# OUTPUT TO SCREEN
print(result)    
# OUTPUT TO FILE
with open('output.xml', 'wb') as f:
    f.write(result)