考虑以下XML(我存储在字符串变量data
中):
<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">foo</a:Action>
</s:Header>
<s:Body>
<ns1:RetrieveStoryML_Response_1 xmlns:ns0="http://www.bar.com" xmlns:ns1="http://www.foo.bar">
<ns1:StoryMLResponse>
<ns1:STORYML xmlns="http://www.none.com">
<HL space="preserve" xmlns:ns3="http://www.foo.foo.foo">
<ID>12345</ID>
<TE>This is the text I'd really like to get</TE>
</HL>
</ns1:STORYML>
</ns1:StoryMLResponse>
</ns1:RetrieveStoryML_Response_1>
</s:Body>
</s:Envelope>
我正试图摆脱HL
标签中未包含的所有内容。我的预期输出是:
<?xml version="1.0" encoding="UTF-8"?>
<HL space="preserve" xmlns:ns3="http://www.foo.foo.foo">
<ID>12345</ID>
<TE>This is the text I'd really like to get</TE>
</HL>
所以我加载数据:
import xml.etree.ElementTree as ET
root = ET.fromstring(data)
root
现在是这样的:
<Element '{http://www.w3.org/2003/05/soap-envelope}Envelope' at 0x000000000BE6D4A8>
然后我尝试使用这样的findall
方法,遵循XML Xpath docs:
root.findall('.//HL')
但是我回复了一个空列表。如何有效地过滤此XML?