Question

我想删除某个标记值的元素，然后写出.xml文件，不包含任何已删除元素的标记;是我创建新树的唯一选择吗？

删除/删除元素有两个选项：

clear() 重置元素。此功能删除所有子元素，清除所有子元素属性，并将text和tail属性设置为None。

起初我使用了它，它的作用是从元素中删除数据，但我仍然留下一个空元素：

# Remove all elements from the tree that are NOT "job" or "make" or "build" elements
log = open("debug.log", "w")
for el in root.iter(*):

    if el.tag != "job" and el.tag != "make" and el.tag != "build":
        print("removed = ", el.tag, el.attrib, file=log)
        el.clear()
    else:
        print("NOT", el.tag, el.attrib, file=log)

log.close()
tree.write("make_and_job_tree.xml", short_empty_elements=False)

问题在于xml.etree.ElementTree.ElementTree.write() still writes out empty tags no matter what:

...仅限关键字的short_empty_elements参数控制格式化不包含内容的元素。如果为True（默认值），它们是作为单个自闭标签发出的，否则就是作为一对开始/结束标记发布。

为什么没有打印出那些空标签的选项！不管。

那么我想我可能会尝试

remove(subelement) 从元素中删除子元素。与find *方法不同 method比较基于实例标识的元素，而不是标记价值或内容。

但这仅适用于子元素。

所以我必须do something like：

for el in root.iter(*):
    for subel in el:
        if subel.tag != "make" and subel.tag != "job" and subel.tag != "build":
            el.remove(subel)

但这里有一个很大的问题：我通过删除元素来使迭代器无效，对吗？

通过添加if subel来简单检查元素是否为空是否足够？：

if subel and subel.tag != "make" and subel.tag != "job" and subel.tag != "build"

或者每次我使树元素失效时，我是否必须获得一个新的迭代器？

请记住：我只是想写出没有空元素标签的xml文件。

这是一个例子。

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

假设我想删除对neighbor的任何提及。理想情况下，删除后我想要这个输出：

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
    </country>
</data>

问题是，当我使用clear（）运行代码（参见上面的第一个代码块）并将其写入文件时，我得到了这个：

<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor></neighbor><neighbor></neighbor></country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor></neighbor></country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor></neighbor><neighbor></neighbor></country>
</data>

注意neighbor仍然出现。

我知道我可以轻松地在输出上运行正则表达式但是必须有一种方法（或其他Python api）可以动态执行此操作，而不是要求我再次触摸我的.xml文件。

Answer 1

import lxml.etree as et

xml  = et.parse("test.xml")

for node in xml.xpath("//neighbor"):
    node.getparent().remove(node)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

使用elementTree，我们需要找到 parents of the neighbor nodes ，然后找到 neighbor nodes inside that parent 并将其删除：

from xml.etree import ElementTree as et

xml  = et.parse("test.xml")


for parent in xml.getroot().findall(".//neighbor/.."):
      for child in parent.findall("./neighbor"):
          parent.remove(child)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

两者都会给你：

<?xml version='1.0' encoding='utf-8'?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        </country>
</data>

使用属性逻辑并修改xml，如下所示：

x = """<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
           <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>"""

使用lxml：

import lxml.etree as et

xml = et.fromstring(x)

for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
    node.getparent().remove(node)
print(et.tostring(xml))

会给你：

 <data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        </country>
</data>

ElementTree中的相同逻辑：

from xml.etree import ElementTree as et

xml = et.parse("test.xml").getroot()

atts = {"build", "job", "make"}

for parent in xml.findall(".//neighbor/.."):
    for child in parent.findall(".//neighbor")[:]:
        if not atts.issubset(child.attrib):
            parent.remove(child)

如果你使用iter：

from xml.etree import ElementTree as et

xml = et.parse("test.xml")

for parent in xml.getroot().iter("*"):
    parent[:] = (child for child in parent if child.tag != "neighbor")

你可以看到我们得到完全相同的输出：

In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">#
      <neighbor name="Austria" direction="E"/>
        <rank>1</rank>
        <neighbor name="Austria" direction="E"/>
        <year>2008</year>
      <neighbor name="Austria" direction="E"/>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
In [31]: paste
def test():
    import lxml.etree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for node in xml.xpath("//neighbor"):
        node.getparent().remove(node)
    a = et.tostring(xml)
    from xml.etree import ElementTree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for parent in xml.getroot().iter("*"):
        parent[:] = (child for child in parent if child.tag != "neighbor")
    b = et.tostring(xml.getroot())
    assert  a == b

## -- End pasted text --

In [32]: test()

Answer 2

每当需要修改XML文档时，还要考虑XSLT，它是包含XPath的XSL系列的特殊用途语言部分。 XSLT专门用于转换XML文件。 Pythoners不会很快推荐它，但它避免了通用代码中循环或嵌套if / then逻辑的需要。 Python的lxml模块可以使用libxslt处理器运行XSLT 1.0脚本。

在转换下运行身份转换以按原样复制文档，然后在<neighbor>上运行空模板匹配以将其删除：

XSLT 脚本（保存为.xsl文件，就像源.xml一样加载，两者都是格式正确的xml文件）

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- IDENTITY TRANSFORM TO COPY XML AS IS -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- EMPTY TEMPLATE TO REMOVE NEIGHBOR WHEREVER IT EXISTS -->  
  <xsl:template match="neighbor"/>

</xsl:transform>

Python 脚本

import lxml.etree as et

# LOAD XML AND XSL DOCUMENTS
xml  = et.parse("Input.xml")
xslt = et.parse("Script.xsl")

# TRANSFORM TO NEW TREE
transform = et.XSLT(xslt)
newdom = transform(xml)

# CONVERT TO STRING
tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)

# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

Answer 3

这里的技巧是找到父（国家节点），并从那里删除邻居。在这个例子中，我使用的是ElementTree，因为我对它有点熟悉：

import xml.etree.ElementTree as ET

if __name__ == '__main__':
    with open('debug.log') as f:
        doc = ET.parse(f)

        for country in doc.findall('.//country'):
            for neighbor in country.findall('neighbor'):
                country.remove(neighbor)

        ET.dump(doc)  # Display

如何将空树节点作为空字符串写入xml文件

3 个答案: