Question

我一直试图解析一些XML几个小时而现在没有运气。检查了类似的主题并查看了ElementTree文档，但仍然完全丢失了。

基本上，我从存储在字符串中的路由器接收一些XML输出，我反过来必须解析某些特定信息。

以下是我正在处理的xml示例：

xml = """<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1D0/junos">
        <route-information xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing">
            <!-- keepalive -->
            <route-table>
                <table-name>inet.0</table-name>
                <destination-count>52</destination-count>
                <total-route-count>52</total-route-count>
                <active-route-count>52</active-route-count>
                <holddown-route-count>0</holddown-route-count>
                <hidden-route-count>0</hidden-route-count>
                <rt junos:style="brief">
                    <rt-destination>5.5.5.5/32</rt-destination>
                    <rt-entry>
                        <active-tag>*</active-tag>
                        <current-active/>
                        <last-active/>
                        <protocol-name>Direct</protocol-name>
                        <preference>0</preference>
                        <age junos:seconds="428929">4d 23:08:49</age>
                        <nh>
                            <selected-next-hop/>
                            <via>lo0.0</via>
                        </nh>
                    </rt-entry>
                </rt>
            </route-table>
        </route-information>
        <cli>
            <banner></banner>
        </cli>
</rpc-reply>"""

例如，我想要获取/打印内容的节点是rt-destination。

我试过了：

root = ET.fromstring(xml)

values = root.find('rt')
for element in values:
    print element.text

此，

value= root.find('rt-destination')

print value

这是在特定节点设置root（指针？），

x = root.getiterator(tag = "destination-count")

有关如何遍历此特定节点或如何获得所需结果的任何帮助都将非常感激。

Answer 1

代码不起作用的原因是命名空间。如果命名空间始终相同，则可以将其编码为您要查找的标记的前缀：

import xml.etree.ElementTree as ET

xml = """
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1D0/junos">
    <route-information xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing">
        <!-- keepalive -->
        <route-table>
            <table-name>inet.0</table-name>
            <destination-count>52</destination-count>
            <total-route-count>52</total-route-count>
            <active-route-count>52</active-route-count>
            <holddown-route-count>0</holddown-route-count>
            <hidden-route-count>0</hidden-route-count>
            <rt junos:style="brief">
                <rt-destination>5.5.5.5/32</rt-destination>
                <rt-entry>
                    <active-tag>*</active-tag>
                    <current-active/>
                    <last-active/>
                    <protocol-name>Direct</protocol-name>
                    <preference>0</preference>
                    <age junos:seconds="428929">4d 23:08:49</age>
                    <nh>
                        <selected-next-hop/>
                        <via>lo0.0</via>
                    </nh>
                </rt-entry>
            </rt>
        </route-table>
    </route-information>
    <cli>
        <banner></banner>
    </cli>
</rpc-reply>
"""

XML_NAMESPACE = '{http://xml.juniper.net/junos/14.1D0/junos-routing}'
root = ET.fromstring(xml)
rt_nodes = root.iter(tag='{}rt-destination'.format(XML_NAMESPACE))
print rt_nodes.next().text  # 5.5.5.5/32

如果您需要更灵活的内容，可以查看答案here。

Answer 2

您缺少route-information标记的命名空间。在您的XML中，您有2个名称空间，遗憾的是，您需要的名称空间没有标记。

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1D0/junos">
    <route-information xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing">

rpc-reply属于命名空间junos，但是，下一层及其下的所有内容都属于未命名（null）命名空间xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing"。

使用root.nsmap为根层提供以下命名空间字典：{'junos': 'http://xml.juniper.net/junos/14.1D0/junos'}。因此，要访问此命名空间中的rt元素，您可以使用：

root.find('junos:rt', namespaces=root.nsmap)

但是，在下一层中，lxml.etree知道命名空间"http://xml.juniper.net/junos/14.1D0/junos-routing"，但由于它没有标签，因此将其提取到命名空间映射，并使用None作为字典键。

>>> nsmap = root.getchildren()[0].nsmap
>>> nsmap
{'junos': 'http://xml.juniper.net/junos/14.1D0/junos',
 None: 'http://xml.juniper.net/junos/14.1D0/junos-routing'}

嗯，这是一个问题，因为我们无法使用None引用命名空间。一种选择是在'http://xml.juniper.net/junos/14.1D0/junos-routing'的字典中创建一个新的命名空间引用。

nsmap['my_ns'] = nsmap.pop(None)

我们需要在此处使用.pop，因为lxml不允许使用名称空间None作为关键字。现在，您可以使用xpath搜索rt-destination标记，并仅返回标记内的文本。

root.xpath('.//my_ns:rt-destination/text()', namespaces=nsmap)

ElementTree：解析XML曾孙子

2 个答案: