我有一个很大的xml文件,我将提取一些标签并将它们写入另一个xml文件中。我写了这段代码:
import xml.etree.cElementTree as CE
tree = CE.ElementTree()
root = CE.Element("root")
i = 0
for event, elem in CE.iterparse('data.xml'):
if elem.tag == "ActivityRef":
print(elem.tag)
a = CE.Element(elem.tag)
root.append(elem)
elem.clear()
i += 1
if i == 200:
break
但是我没有得到想要的结果,我得到了:
<root>
<ActivityRef />
<ActivityRef />
<ActivityRef />
<ActivityRef />
...
</root>
代替此:
<root>
<ActivityRef>
<Id>2008-12-11T20:43:07Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2008-10-11T20:43:07Z</Id>
</ActivityRef>
...
</root>
修改
输入文件:
<?xml version="1.0" encoding="UTF-8"?>
<Folders>
<History>
<Running>
<ActivityRef>
<Id>2009-03-14T17:05:55Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2009-03-13T06:12:42Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2009-03-08T09:00:29Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2009-03-04T19:39:39Z</Id>
</ActivityRef>
...
</Running>
</History>
</Folders>
而且我还需要从源文件中删除元素。 任何人都可以帮忙。 预先谢谢你。
答案 0 :(得分:0)
使用XPATH
import xml.etree.ElementTree as ET
data = '''<?xml version="1.0" encoding="UTF-8"?>
<Folders>
<History>
<Running>
<ActivityRef>
<Id>2009-03-14T17:05:55Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2009-03-13T06:12:42Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2009-03-08T09:00:29Z</Id>
</ActivityRef>
<ActivityRef>
<Id>2009-03-04T19:39:39Z</Id>
</ActivityRef>
</Running>
</History>
</Folders>'''
root = ET.fromstring(data)
# 'activities' contains the elements you are looking for
activities = root.findall('.//ActivityRef')