lxml - Remove element if grand children have text

时间:2019-02-18 00:27:36

标签: python xpath lxml

So I'm trying to remove element (data) if its subElement value have text "1". I did some research found how to remove element value, but I have no idea how to remove the grandparent of the element. I know I can find text by searching this way and then remove element, but that's all I could find.

e = root.xpath('.//value[text()="1"]')
e.getParent().remove

My XML document looks like this:

<root>
  <Data>
    <FirstName>Name</FirstName>
    <EMail>email@email.com</EMail>
    <Number>123</Number>
    <delete>
      <value>0</value>
    </delete>
  </Data>
  <Data>
    <FirstName>Name</FirstName>
    <EMail>some@email.com</EMail>
    <delete>
      <value>1</value>
    </delete>
    <Number>456</Number>
  </Data>
</root>

Expect result:

<root>
  <Data>
    <FirstName>Name</FirstName>
    <EMail>email@email.com</EMail>
    <Number>123</Number>
    <delete>
    <value>0</value>
    </delete>
  </Data>
</root>

Basically i want to remove element data if element's value contains specific text.

2 个答案:

答案 0 :(得分:1)

考虑XSLT(与XPath兄弟),这是一种专用语言,旨在将XML文件转换为其他XML。 Python的lxml模块除了运行XPath 1.0外,还可以运行XSLT 1.0脚本。

XSLT (另存为.xsl文件,一个特殊的.xml文件)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output version="1.0" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Removes entire Data node with delete child value equal to 1 -->
  <xsl:template match="Data[delete/value='1']"/>

</xsl:transform>

Python (没有for循环或if逻辑)

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('input.xml')
xsl = et.parse('xslt_script.xsl')

# TRANSFORM INPUT 
transform = et.XSLT(xsl)    
result = transform(xml)

# SAVE TO FILE
with open('output.xml', 'wb') as f:
    f.write(result)

答案 1 :(得分:1)

如果要删除元素,则需要其父元素。在这种情况下,Data的父级是root(也恰好是根元素)。

不是选择value,而是使用predicate选择Data并将其从root中删除,就像这样...

Python

from lxml import etree

tree = etree.parse("test.xml")
root = tree.getroot()

for data in tree.xpath("./Data[delete/value='1']"):
    root.remove(data)

print(etree.tostring(tree, pretty_print=True).decode())

打印输出

<root>
  <Data>
    <FirstName>Name</FirstName>
    <EMail>email@email.com</EMail>
    <Number>123</Number>
    <delete>
      <value>0</value>
    </delete>
  </Data>
</root>

如果我需要删除一个元素,则几乎不会使用getparent();我专门选择父母。如果我需要进行更复杂的转换,可以使用Parfait建议的XSLT。