下面的代码工作正常,但是没有任何pythonic方法来获得相同的功能? 我只想解析XML并从几个元素(name,name_status,url)中获取文本。
from lxml import etree
from urllib2 import urlopen
def ask_CoL(url):
tree = etree.parse(urlopen(url))
tn=[ el.get('total_number_of_results') for el in tree.iter('results') ]
try:
nr = int(tn[0])
except ValueError:
nr = 0
if nr == 1:
newstr = str([ el.text for el in tree.getiterator(tag='name')])\
.strip("[]'")+','\
+str([ el.text for el in tree.getiterator(tag='name_status')])\
.strip("[]'")+','\
+str([ el.text for el in tree.getiterator(tag='url')])\
.strip("[]'")+'\n'
else:
newstr = 'NA\n'
return newstr
示例XML:
<results id="" name="Theragra chalcogramma" total_number_of_results="1" number_of_results_returned="1" start="0" error_message="" version="1.6 rev 1152">
<result>
<id>9037795</id>
<name>Theragra chalcogramma</name>
<rank>Species</rank>
<name_status>accepted name</name_status>
<online_resource>http://www.fishbase.org/Summary/SpeciesSummary.php?ID=318</online_resource>
<source_database>FishBase</source_database>
<source_database_url>http://www.fishbase.org</source_database_url>
<name_html><i>Theragra chalcogramma</i> (Pallas, 1814)</name_html>
<url>http://www.catalogueoflife.org/col/details/species/id/9037795</url>
</result>
</results>
答案 0 :(得分:1)
您可以简化界面和实施:
import urllib2
from xml.etree import cElementTree as etree
def f(url):
tree = etree.parse(urllib2.urlopen(url))
el = tree.find('results')
if el is not None:
lst = [el.findtext(tag) or '' for tag in "name name_status url".split()]
return ','.join(lst)