假设我有一个以下格式的XML文档
<root>
<foos>
<foo>the quick <bar>brown </bar>fox</foo>
</foos>
<!-- Lots more <foo></foo> -->
</root>
如何提取全文字符串the quick fox
以及字符串brown
?
import xml.etree.ElementTree as ET
doc = ET.parse(xmldocument).getroot()
foos = doc.find('foos')
for foo in foos:
print foo.text # This will print 'the quick '
不确定如何解决这个问题。
答案 0 :(得分:2)
你也可以尝试这样的东西,它会自动迭代所有嵌套的标签:
foos = doc.find('foos')
for foo in foos:
for text in foo.itertext():
print text.strip(),
print
答案 1 :(得分:0)
from scrapy.selector import XmlXPathSelector
xml = \
"""
<root>
<foos>
<foo>the quick <bar>brown </bar>fox</foo>
</foos>
</root>
"""
hxs =XmlXPathSelector(text=xml)
foos = hxs.select('//foos')
for one in foos:
text = one.select('./foo//text()').extract()
text = ''.join(text)
print text