我想输出页面主要元素的列表。摘要印在下面。我需要一种方法只使用python抓取文本标签之间的文本。如果成功,我希望下面内容的输出为:
数学,微分方程,几何
<language>english</language>
<concepts>
<concept>
<text>Mathematics</text>
<relevance>0.988094</relevance>
<dbpedia>http://dbpedia.org/resource/Mathematics</dbpedia>
<freebase>http://rdf.freebase.com/ns/m.04rjg</freebase>
<opencyc>http://sw.opencyc.org/concept/Mx4rvVjHd5wpEbGdrcN5Y29ycA</opencyc>
</concept>
<concept>
<text>Differential equation</text>
<relevance>0.729187</relevance>
<dbpedia>http://dbpedia.org/resource/Differential_equation</dbpedia>
<freebase>http://rdf.freebase.com/ns/m.050fdl</freebase>
<opencyc>http://sw.opencyc.org/concept/Mx4rvXXRFJwpEbGdrcN5Y29ycA</opencyc>
</concept>
<concept>
<text>Geometry</text>
<relevance>0.677052</relevance>
<dbpedia>http://dbpedia.org/resource/Geometry</dbpedia>
<freebase>http://rdf.freebase.com/ns/m.025x7g_</freebase>
<opencyc>http://sw.opencyc.org/concept/Mx4rvgcAf5wpEbGdrcN5Y29ycA</opencyc>
</concept>
<concept>
答案 0 :(得分:0)
你应该看看一些xml解析器。他们很容易获得。例如:
from xml.etree import ElementTree
doc = ElementTree.fromstring(xmlstring)
for tag in doc.findall('.//text'):
print(tag.text)