LXML:如何找到所有自定义标签,如xi:xml中的某些东西

时间:2018-03-22 10:57:23

标签: python lxml

我的XML:

<sample>
    <sample1>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample1>
    <sample2>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample2>
</sample>

我必须使用python找到所有<xi:something> 我已经尝试了python 3.6的lxml and xml库,但没有找到标有xi:something的标签。

1 个答案:

答案 0 :(得分:1)

您应首先修复XML并使用适当的namespace作为xi前缀。让文件so.xml包含此内容:

<?xml version="1.0"?>
<sample xmlns:xi="urn:xi">
    <sample1>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample1>
    <sample2>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample2>
</sample>

然后您可以使用XPath and namespaces

from lxml import etree

x = etree.parse(open("so.xml"))

something = x.xpath("//xi:something", namespaces={"xi": "urn:xi"})
for s in something:
    print(s.tag)
    print(s.get("href"))
    print(s.get("tags"))

输出将是:

{urn:xi}something
sample.html
something
{urn:xi}something
sample.html
something
...