我有一个xml文件,如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<kw name="k1" library="k1">
<kw name="k2" library="k2">
<kw name="Keep This" library="Keep This">
<c name="c4" library="c4">
</c>
</kw>
<kw name="k3" library="k3">
<c name="c4" library="c4">
</c>
</kw>
<c name="c3" library="c3">
<c name="c4" library="c4">
</c>
</c>
</kw>
</kw>
我想删除表,但除外,请遵守以下规则:
另一个表应从xml中删除
所以输出应该像:
<?xml version="1.0" encoding="UTF-8"?>
<kw name="k1" library="k1">
<kw name="k2" library="k2">
<kw name="Keep This" library="Keep This">
<c name="c4" library="c4">
</c>
</kw>
<c name="c3" library="c3">
<c name="c4" library="c4">
</c>
</c>
</kw>
</kw>
跟踪递归函数真的很困难,有人可以帮助我还是推荐另一种方式来满足我的要求?
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
def check(root):
# if subchild exist "kw" tag, parse to the subchild
if 'kw' in ([child.tag for child in root]):
for child in root:
flag = check(child)
# remove
if not flag:
root.remove(child)
# if subchild dose not exist "kw" tag
else:
if root.tag == 'kw':
# Check if itself's tag is kw and "Keep this"
if 'Keep This' in [root.attrib[child] for child in root.attrib]:
return True
# Remove if itself's tag is kw but without "Keep this"
else:
print ('remove')
return False
else:
return True
check(root)
ET.dump(root)
答案 0 :(得分:1)
您可以改为使用以下递归函数。请注意,使用异常作为通知父级删除子级的方法,因为必须从父级执行节点删除,并且布尔返回值仅指示带有标签kw
和子级{找到Keep This
的属性值。这样做的好处是,当根节点下根本没有找到“保持”节点时,通知调用者,根节点根据规则应将其删除,但不能删除,因为它是根节点:
import xml.etree.ElementTree as ET
def check(node):
if node.tag == 'kw' and any(value == 'Keep This' for value in node.attrib.values()):
return True
keep = False
removals = []
for child in node:
try:
if check(child):
keep = True
except RuntimeError:
removals.append(child)
for child in removals:
node.remove(child)
if node.tag == 'kw' and not keep:
raise RuntimeError('No "keep" node found under this node')
return keep
tree = ET.parse('a.xml')
root = tree.getroot()
check(root)
ET.dump(root)
使用示例输入,将输出:
<kw library="k1" name="k1">
<kw library="k2" name="k2">
<kw library="Keep This" name="Keep This">
<c library="c4" name="c4">
</c>
</kw>
<c library="c3" name="c3">
<c library="c4" name="c4">
</c>
</c>
</kw>
</kw>