根据给定的属性列表删除xml树中的Sub元素

时间:2019-05-01 16:04:47

标签: python xml recursion

我正在尝试提取一个csv文件,并根据其属性删除一些元素。我目前拥有的代码正在根据列表删除元素,但不会连续删除元素,这可能是因为在迭代元素时会弄乱元素。还没找到解决办法,有什么主意吗?

def attri_remover(tree, remove_list):
    root = tree
    return_tree = tree
    for child in root:
        if child.attrib in remove_list:
            return_tree.remove(child)
        elif len(child) >= 1:
            child = attri_remover(child, remove_list)

    return return_tree

例如,如果给定csv:

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
    <PROP NAME="prod_availabile">
      <PVAL>42810932-1</PVAL>
    </PROP>
  </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
    <PROP NAME="prod_availabile">
      <PVAL>43901223-1</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

我将其转换为元素树 并给出删除列表:

[{'NAME': 'prod_available'}]

该函数应该返回一个元素树,该树等于:

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
  </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

1 个答案:

答案 0 :(得分:0)

一种选择是选择要使用XPath删除的元素,而不是遍历所有元素。

您没有指定是使用ElementTree还是lxml(或完全使用其他方式),所以我选择lxml是因为XPath support in ElementTree受限制。

这是一个例子...

XML输入(input.xml)

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
    <PROP NAME="prod_available">
      <PVAL>42810932-1</PVAL>
    </PROP>
  </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
    <PROP NAME="prod_available">
      <PVAL>43901223-1</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

Python

from lxml import etree


def attri_remover(input_tree, remove_list):
    for attr_name, attr_value in [(k, v) for attr in remove_list for (k, v) in attr.items()]:
        # XPath matches any element that contains an attribute with the same name and value.
        for target_element in input_tree.xpath(f"//*[@{attr_name}[.='{attr_value}']]"):
            target_element.getparent().remove(target_element)


tree = etree.parse("input.xml")

# Appears to be a list of dicts that contain attribute name/value pairs.
to_remove = [{'NAME': 'prod_available'}]

attri_remover(tree, to_remove)

tree.write("output.xml")

XML输出(output.xml)

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
    </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
    </RECORD>
</RECORDS>

注意:在我的示例中,该函数修改了原始树。如果希望函数返回其他树,则应复制该树,对其进行修改,然后将其返回。