Question

我的XML：

<sample>
    <sample1>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample1>
    <sample2>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample2>
</sample>

我必须使用python找到所有<xi:something> 我已经尝试了python 3.6的lxml and xml库，但没有找到标有xi:something的标签。

Answer 1

您应首先修复XML并使用适当的namespace作为xi前缀。让文件so.xml包含此内容：

<?xml version="1.0"?>
<sample xmlns:xi="urn:xi">
    <sample1>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample1>
    <sample2>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
        <xi:something href="sample.html" tags="something"/>
    </sample2>
</sample>

然后您可以使用XPath and namespaces：

from lxml import etree

x = etree.parse(open("so.xml"))

something = x.xpath("//xi:something", namespaces={"xi": "urn:xi"})
for s in something:
    print(s.tag)
    print(s.get("href"))
    print(s.get("tags"))

输出将是：

{urn:xi}something
sample.html
something
{urn:xi}something
sample.html
something
...

LXML：如何找到所有自定义标签，如xi：xml中的某些东西

1 个答案: