我有一个节点列表,我想从xml文档中删除它们。但是我在删除元素并将修改后的文档写入新的xml文件时遇到了问题。
这是我编写的一个python程序[我正在使用elementTree]
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('autogen_test.xml')
root = tree.getroot()
keeper_data = ['4294905264']
instances = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
#print instance
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
for tag in removeList:
parent = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
parent.remove(tag)
tree.write("out.xml")
我的样本xml如下[这是一个标准输入,我无法修改它]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIMENSIONS SYSTEM "dimensions.dtd">
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334"/>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2"/>
<SYN DISPLAY="TRUE" SEARCH="FALSE" CLASSIFY="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905325"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">hanes</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905315"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">lee</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905281"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">levi's</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
即使在遍历列表并找到要删除的所有节点之后也是如此。 tree.write(“out.xml”)总是打印出原始的xml。基本上我需要删除原始xml中标识的内容。
预期产出:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
答案 0 :(得分:1)
要删除的所有DIMENSION_NODE
共享同一个父DIMENSION_NODE
,因此在循环遍历removeList
之前只获得一次更高效。更重要的是,您希望获得父DIMENSION_NODE
而不是子DIMENSION_NODE
,因此正确的XPath是./DIMENSION/DIMENSION_NODE
。简而言之,尝试使用以下代码更改第二个for
循环:
parent = tree.find('./DIMENSION/DIMENSION_NODE')
for tag in removeList:
parent.remove(tag)
这是演示的完整工作示例(只需要用实际的XML替换source
值):
import xml.etree.ElementTree as ET
source = """replace with the XML in question"""
root = ET.fromstring(source)
keeper_data = ['4294905264']
instances = root.findall('.//DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
parent = root.find('.//DIMENSION/DIMENSION_NODE')
for tag in removeList:
parent.remove(tag)
print(ET.tostring(root))
将有问题的XML视为source
变量的值,输出为:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2" />
<SYN CLASSIFY="FALSE" DISPLAY="TRUE" SEARCH="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>