如何使用ElementTree从标签中提取文本

时间:2012-06-18 22:02:15

标签: python xml elementtree

我有以下XML文件:

<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>

我需要从特定标签中提取文本。在http://effbot.org上只有很少的例子,而且文档很少。也许在其他地方有很好的例子?如何处理相同标签(令牌)中的文本作为单独的实体?提前致谢!结果大致如下:

(like) feel > not #This is not text

1 个答案:

答案 0 :(得分:0)

我不清楚您希望如何处理<mess>元素的内容 对于<verb>元素的子元素,请尝试以下操作:

import xml.etree.ElementTree as ET
the_tree = ET.fromstring('''<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>''')
elems = the_tree.find('./verb').getchildren()
verbs = [verb.text for verb in elems]
# -> ['like', 'feel']

如果您的文件较大,也许您更喜欢这种访问元素的替代方式:

tree, id_map = ET.XMLID('''<class id="1" name="good/bad">
    <verb>
        <token>like</token>
        <token>feel</token>
    </verb>
    <mess>This is <sugg>not</sugg> text</mess>
    <id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
    <id type="correct">I'm glad to see you.</id>
</class>''')
elems = id_map['1'].find('verb')
verbs = [verb.text for verb in elems]