如何通过python在xml文件中获取没有标签的文本

时间:2018-10-26 14:19:38

标签: python regex xml

<?xml version='1.0' encoding='UTF-8'?>
<GateDocument>
<!-- The document content area with serialized nodes -->

<TextWithNodes><Node id="0" />Norway<Node id="6" /> <Node id="7" 
/>to<Node id="9" /> <Node id="10" />'<Node id="11" />completely<Node 
id="21" /> <Node id="22" />ban<Node id="25" /> <Node id="26" 
/>petrol<Node id="32" /> <Node id="33" />powered<Node id="40" /> <Node 
id="41" />cars<Node id="45" /> <Node id="46" />by<Node id="48" /> <Node 
id="49" />2025<Node id="53" />'<Node id="54" />.<Node id="55" /> . 
</TextWithNodes>
</GateDocument>

从上面的XML文件中,您可以注意到“ TextWithNodes”标签中的单词没有标签。我怎么能通过python获得“汽油动力汽车”文本

谢谢

1 个答案:

答案 0 :(得分:0)

itertext()找到想要的节点后,可以使用findall()方法:

from xml.etree import ElementTree as ET
x = '''<?xml version='1.0' encoding='UTF-8'?>
<GateDocument>
<!-- The document content area with serialized nodes -->

<TextWithNodes><Node id="0" />Norway<Node id="6" /> <Node id="7"
/>to<Node id="9" /> <Node id="10" />'<Node id="11" />completely<Node
id="21" /> <Node id="22" />ban<Node id="25" /> <Node id="26"
/>petrol<Node id="32" /> <Node id="33" />powered<Node id="40" /> <Node
id="41" />cars<Node id="45" /> <Node id="46" />by<Node id="48" /> <Node
id="49" />2025<Node id="53" />'<Node id="54" />.<Node id="55" /> .
</TextWithNodes>
</GateDocument>'''
t = ET.fromstring(x)
print(''.join(t.findall('.//TextWithNodes')[0].itertext()))

这将输出:

Norway to 'completely ban petrol powered cars by 2025'. .