我的XML:
<sample>
<sample1>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
</sample1>
<sample2>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
</sample2>
</sample>
我必须使用python找到所有<xi:something>
我已经尝试了python 3.6的lxml and xml
库,但没有找到标有xi:something
的标签。
答案 0 :(得分:1)
您应首先修复XML并使用适当的namespace作为xi
前缀。让文件so.xml
包含此内容:
<?xml version="1.0"?>
<sample xmlns:xi="urn:xi">
<sample1>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
</sample1>
<sample2>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
<xi:something href="sample.html" tags="something"/>
</sample2>
</sample>
然后您可以使用XPath and namespaces:
from lxml import etree
x = etree.parse(open("so.xml"))
something = x.xpath("//xi:something", namespaces={"xi": "urn:xi"})
for s in something:
print(s.tag)
print(s.get("href"))
print(s.get("tags"))
输出将是:
{urn:xi}something
sample.html
something
{urn:xi}something
sample.html
something
...