如何只打印某些xml元素

时间:2015-10-28 20:42:05

标签: python xml

我想输出页面主要元素的列表。摘要印在下面。我需要一种方法只使用python抓取文本标签之间的文本。如果成功,我希望下面内容的输出为:

数学,微分方程,几何

<language>english</language>
        <concepts>
            <concept>
                <text>Mathematics</text>
                <relevance>0.988094</relevance>
                <dbpedia>http://dbpedia.org/resource/Mathematics</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.04rjg</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvVjHd5wpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
                <text>Differential equation</text>
                <relevance>0.729187</relevance>
                <dbpedia>http://dbpedia.org/resource/Differential_equation</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.050fdl</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvXXRFJwpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
                <text>Geometry</text>
                <relevance>0.677052</relevance>
                <dbpedia>http://dbpedia.org/resource/Geometry</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.025x7g_</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvgcAf5wpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>

1 个答案:

答案 0 :(得分:0)

你应该看看一些xml解析器。他们很容易获得。例如:

from xml.etree import ElementTree

doc = ElementTree.fromstring(xmlstring)
for tag in doc.findall('.//text'):
  print(tag.text)