我有这个xml输入文件:
<?xml version="1.0"?>
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
<second>
<third-num>2</third-num>
<third-def>object002</third-def>
<third-len>426</third-len>
</second>
<second>
<third-num>3</third-num>
<third-def>object003</third-def>
<third-len>998</third-len>
</second>
</First>
</zero>
我的目标是删除<third-def>
不是值的任何第二级。为此,我写了这段代码:
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
inputfile='inputfile.xml'
tree = ET.parse(inputfile)
root = tree.getroot()
elem = tree.find('First')
for elem2 in tree.iter(tag='second'):
if elem2.find('third-def').text == 'object001':
pass
else:
elem.remove(elem2)
#elem2.clear()
我的问题是elem.remove(elem2)
。它会跳过其他所有第二级。以下是此代码的输出:
<?xml version="1.0" ?>
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
<second>
<third-num>3</third-num>
<third-def>object003</third-def>
<third-len>998</third-len>
</second>
</First>
</zero>
现在,如果我取消对elem2.clear()
行的评论,那么该脚本可以正常运行,但输出不太好,因为它保留了所有已移除的第二级:
<?xml version="1.0" ?>
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
<second/>
<second/>
</First>
</zero>
有人知道为什么我的element.remove()
陈述错了吗?
答案 0 :(得分:5)
你正在循环实况树:
for elem2 in tree.iter(tag='second'):
然后在迭代时更改。 &#39;计数器&#39;迭代的结果不会被告知有关元素数量的变化,因此当查看元素0并删除该元素时,迭代器将继续移动到元素编号1.但是是元素编号1现在是元素编号0。
首先捕获所有元素的列表,然后循环遍历:
for elem2 in tree.findall('.//second'):
.findall()
会返回结果列表,在您更改树时不会更新。
现在迭代不会跳过最后一个元素:
>>> print ET.tostring(tree)
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
</First>
</zero>
这种现象不仅限于ElementTree树;见Loop "Forgets" to Remove Some Items