我有以下XML文件:
<class id="1" name="good/bad">
<verb>
<token>like</token>
<token>feel</token>
</verb>
<mess>This is <sugg>not</sugg> text</mess>
<id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
<id type="correct">I'm glad to see you.</id>
</class>
我需要从特定标签中提取文本。在http://effbot.org上只有很少的例子,而且文档很少。也许在其他地方有很好的例子?如何处理相同标签(令牌)中的文本作为单独的实体?提前致谢!结果大致如下:
(like) feel > not #This is not text
答案 0 :(得分:0)
我不清楚您希望如何处理<mess>
元素的内容
对于<verb>
元素的子元素,请尝试以下操作:
import xml.etree.ElementTree as ET
the_tree = ET.fromstring('''<class id="1" name="good/bad">
<verb>
<token>like</token>
<token>feel</token>
</verb>
<mess>This is <sugg>not</sugg> text</mess>
<id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
<id type="correct">I'm glad to see you.</id>
</class>''')
elems = the_tree.find('./verb').getchildren()
verbs = [verb.text for verb in elems]
# -> ['like', 'feel']
如果您的文件较大,也许您更喜欢这种访问元素的替代方式:
tree, id_map = ET.XMLID('''<class id="1" name="good/bad">
<verb>
<token>like</token>
<token>feel</token>
</verb>
<mess>This is <sugg>not</sugg> text</mess>
<id type="incorrect">I'm glad to <marker>unsee you</marker>.</id>
<id type="correct">I'm glad to see you.</id>
</class>''')
elems = id_map['1'].find('verb')
verbs = [verb.text for verb in elems]