假设我有来自mediawiki api的以下XML响应。我想找出wiki主题修订的最早日期,在本案例中是2005-08-23。我如何解析xml来找出它。我正在使用python btw。
<?xml version="1.0"?>
<api>
<query-continue>
<revisions rvcontinue="46214352" />
</query-continue>
<query>
<pageids>
<id>2516600</id>
</pageids>
<pages>
<page pageid="2516600" ns="0" title="!Kung language">
<revisions>
<rev timestamp="2005-08-23T00:58:40Z" />
<rev timestamp="2005-08-23T01:01:00Z" />
<rev timestamp="2005-09-02T07:21:37Z" />
<rev timestamp="2005-09-02T07:24:28Z" />
<rev timestamp="2006-01-06T07:45:35Z" />
<rev timestamp="2006-03-22T09:03:23Z" />
<rev timestamp="2006-03-30T05:50:12Z" />
<rev timestamp="2006-03-30T20:33:22Z" />
<rev timestamp="2006-03-30T20:35:05Z" />
<rev timestamp="2006-03-30T20:37:16Z" />
</revisions>
</page>
</pages>
</query>
</api>
我尝试了以下
revisions = text.getElementsByTagName("revisions")
for x in revisions:
children = x.childNodes
for y in children:
print y.nodeValue
但所有这一切都是打印无。
答案 0 :(得分:1)
我会使用带有XPath表达式的lxml:
from lxml import etree
root = etree.fromstring(xml)
timestamps = root.xpath('//rev/@timestamp')
至于你的代码,你没有得到元素的属性。为此,请使用getAttribute
:
print y.getAttribute('timestamp')