<?xml version='1.0' encoding='UTF-8'?>
<GateDocument>
<!-- The document content area with serialized nodes -->
<TextWithNodes><Node id="0" />Norway<Node id="6" /> <Node id="7"
/>to<Node id="9" /> <Node id="10" />'<Node id="11" />completely<Node
id="21" /> <Node id="22" />ban<Node id="25" /> <Node id="26"
/>petrol<Node id="32" /> <Node id="33" />powered<Node id="40" /> <Node
id="41" />cars<Node id="45" /> <Node id="46" />by<Node id="48" /> <Node
id="49" />2025<Node id="53" />'<Node id="54" />.<Node id="55" /> .
</TextWithNodes>
</GateDocument>
从上面的XML文件中,您可以注意到“ TextWithNodes”标签中的单词没有标签。我怎么能通过python获得“汽油动力汽车”文本
谢谢
答案 0 :(得分:0)
用itertext()
找到想要的节点后,可以使用findall()
方法:
from xml.etree import ElementTree as ET
x = '''<?xml version='1.0' encoding='UTF-8'?>
<GateDocument>
<!-- The document content area with serialized nodes -->
<TextWithNodes><Node id="0" />Norway<Node id="6" /> <Node id="7"
/>to<Node id="9" /> <Node id="10" />'<Node id="11" />completely<Node
id="21" /> <Node id="22" />ban<Node id="25" /> <Node id="26"
/>petrol<Node id="32" /> <Node id="33" />powered<Node id="40" /> <Node
id="41" />cars<Node id="45" /> <Node id="46" />by<Node id="48" /> <Node
id="49" />2025<Node id="53" />'<Node id="54" />.<Node id="55" /> .
</TextWithNodes>
</GateDocument>'''
t = ET.fromstring(x)
print(''.join(t.findall('.//TextWithNodes')[0].itertext()))
这将输出:
Norway to 'completely ban petrol powered cars by 2025'. .