此XML文档包含一组标记events-data
。我想从最新的events-data
中提取信息。例如,在下面的代码中,我想转到 last events-data
标记,转到event-date
标记并提取date
子项的文本标签。目前我在Python中使用BeautifulSoup来遍历这个文档。有什么想法吗?
<?xml version="1.0" encoding="UTF-8"?>
<first-tag>
<second-tag>
<events-data>
<event-date>
<date>20040913</date>
</event-date>
</events-data>
<events-data> #the one i want to traverse to grab date text
<event-date>
<date>20040913</date>
</event-date>
</events-data>
</second-tag>
</first-tag>
答案 0 :(得分:1)
这是使用BeautifulSoup 3
import os
import sys
# Import Custom libraries
from BeautifulSoup import BeautifulStoneSoup
xml_str = \
'''
<?xml version="1.0" encoding="UTF-8"?>
<first-tag>
<second-tag>
<events-data>
<event-date>
<date>20040913</date>
</event-date>
</events-data>
<events-data>
<event-date>
<date>20040913</date>
</event-date>
</events-data>
</second-tag>
</first-tag>
'''
soup = BeautifulStoneSoup(xml_str)
event_data_location = lambda x: x.name == "events-data"
events = soup.findAll(event_data_location)
if(events):
# The last event-data
print events[-1].text