Question

我正在尝试提取一个csv文件，并根据其属性删除一些元素。我目前拥有的代码正在根据列表删除元素，但不会连续删除元素，这可能是因为在迭代元素时会弄乱元素。还没找到解决办法，有什么主意吗？

def attri_remover(tree, remove_list):
    root = tree
    return_tree = tree
    for child in root:
        if child.attrib in remove_list:
            return_tree.remove(child)
        elif len(child) >= 1:
            child = attri_remover(child, remove_list)

    return return_tree

例如，如果给定csv：

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
    <PROP NAME="prod_availabile">
      <PVAL>42810932-1</PVAL>
    </PROP>
  </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
    <PROP NAME="prod_availabile">
      <PVAL>43901223-1</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

我将其转换为元素树并给出删除列表：

[{'NAME': 'prod_available'}]

该函数应该返回一个元素树，该树等于：

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
  </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

Answer 1

一种选择是选择要使用XPath删除的元素，而不是遍历所有元素。

您没有指定是使用ElementTree还是lxml（或完全使用其他方式），所以我选择lxml是因为XPath support in ElementTree受限制。

这是一个例子...

XML输入（input.xml）

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
    <PROP NAME="prod_available">
      <PVAL>42810932-1</PVAL>
    </PROP>
  </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
    <PROP NAME="prod_available">
      <PVAL>43901223-1</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

Python

from lxml import etree


def attri_remover(input_tree, remove_list):
    for attr_name, attr_value in [(k, v) for attr in remove_list for (k, v) in attr.items()]:
        # XPath matches any element that contains an attribute with the same name and value.
        for target_element in input_tree.xpath(f"//*[@{attr_name}[.='{attr_value}']]"):
            target_element.getparent().remove(target_element)


tree = etree.parse("input.xml")

# Appears to be a list of dicts that contain attribute name/value pairs.
to_remove = [{'NAME': 'prod_available'}]

attri_remover(tree, to_remove)

tree.write("output.xml")

XML输出（output.xml）

<RECORDS>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>40342</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>94201</PVAL>
    </PROP>
    </RECORD>
  <RECORD>
    <PROP NAME="sort">
      <PVAL>94829</PVAL>
    </PROP>
    <PROP NAME="prod_number">
      <PVAL>83921</PVAL>
    </PROP>
    </RECORD>
</RECORDS>

注意：在我的示例中，该函数修改了原始树。如果希望函数返回其他树，则应复制该树，对其进行修改，然后将其返回。

根据给定的属性列表删除xml树中的Sub元素

1 个答案: