我正在尝试提取一个csv文件,并根据其属性删除一些元素。我目前拥有的代码正在根据列表删除元素,但不会连续删除元素,这可能是因为在迭代元素时会弄乱元素。还没找到解决办法,有什么主意吗?
def attri_remover(tree, remove_list):
root = tree
return_tree = tree
for child in root:
if child.attrib in remove_list:
return_tree.remove(child)
elif len(child) >= 1:
child = attri_remover(child, remove_list)
return return_tree
例如,如果给定csv:
<RECORDS>
<RECORD>
<PROP NAME="sort">
<PVAL>40342</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>94201</PVAL>
</PROP>
<PROP NAME="prod_availabile">
<PVAL>42810932-1</PVAL>
</PROP>
</RECORD>
<RECORD>
<PROP NAME="sort">
<PVAL>94829</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>83921</PVAL>
</PROP>
<PROP NAME="prod_availabile">
<PVAL>43901223-1</PVAL>
</PROP>
</RECORD>
</RECORDS>
我将其转换为元素树 并给出删除列表:
[{'NAME': 'prod_available'}]
该函数应该返回一个元素树,该树等于:
<RECORDS>
<RECORD>
<PROP NAME="sort">
<PVAL>40342</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>94201</PVAL>
</PROP>
</RECORD>
<RECORD>
<PROP NAME="sort">
<PVAL>94829</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>83921</PVAL>
</PROP>
</RECORD>
</RECORDS>
答案 0 :(得分:0)
一种选择是选择要使用XPath删除的元素,而不是遍历所有元素。
您没有指定是使用ElementTree还是lxml(或完全使用其他方式),所以我选择lxml是因为XPath support in ElementTree受限制。
这是一个例子...
XML输入(input.xml)
<RECORDS>
<RECORD>
<PROP NAME="sort">
<PVAL>40342</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>94201</PVAL>
</PROP>
<PROP NAME="prod_available">
<PVAL>42810932-1</PVAL>
</PROP>
</RECORD>
<RECORD>
<PROP NAME="sort">
<PVAL>94829</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>83921</PVAL>
</PROP>
<PROP NAME="prod_available">
<PVAL>43901223-1</PVAL>
</PROP>
</RECORD>
</RECORDS>
Python
from lxml import etree
def attri_remover(input_tree, remove_list):
for attr_name, attr_value in [(k, v) for attr in remove_list for (k, v) in attr.items()]:
# XPath matches any element that contains an attribute with the same name and value.
for target_element in input_tree.xpath(f"//*[@{attr_name}[.='{attr_value}']]"):
target_element.getparent().remove(target_element)
tree = etree.parse("input.xml")
# Appears to be a list of dicts that contain attribute name/value pairs.
to_remove = [{'NAME': 'prod_available'}]
attri_remover(tree, to_remove)
tree.write("output.xml")
XML输出(output.xml)
<RECORDS>
<RECORD>
<PROP NAME="sort">
<PVAL>40342</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>94201</PVAL>
</PROP>
</RECORD>
<RECORD>
<PROP NAME="sort">
<PVAL>94829</PVAL>
</PROP>
<PROP NAME="prod_number">
<PVAL>83921</PVAL>
</PROP>
</RECORD>
</RECORDS>
注意:在我的示例中,该函数修改了原始树。如果希望函数返回其他树,则应复制该树,对其进行修改,然后将其返回。