如何使用python从XML文件中仅解析和获取所需的XML元素?

时间:2018-08-20 11:03:29

标签: python json xml

我有一个看起来像这样的XML文件:

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/15.1R5/junos">
    <vlan-information xmlns="http://xml.juniper.net/junos/15.1R5/junos-esw" junos:style="brief">
        <vlan-terse/>
        <vlan>
            <vlan-instance>0</vlan-instance>
            <vlan-name>ACRS-Dev2</vlan-name>
            <vlan-create-time>Fri Jan  1 00:37:59 2010
            </vlan-create-time>
            <vlan-status>Enabled</vlan-status>
            <vlan-owner>static</vlan-owner>
            <vlan-tag>0</vlan-tag>
            <vlan-index>2</vlan-index>
            <vlan-l3-interface>vlan.15 (UP)</vlan-l3-interface>
            <vlan-l3-interface-address>10.8.25.1/24</vlan-l3-interface-address>
            <vlan-protocol-port>Port Mode</vlan-protocol-port>
            <vlan-members-count>7</vlan-members-count>
            <vlan-members-upcount>6</vlan-members-upcount>
        </vlan>
        <vlan>
            <vlan-instance>0</vlan-instance>
            <vlan-name>default</vlan-name>
            <vlan-create-time>Fri Jan  1 00:37:59 2010
            </vlan-create-time>
            <vlan-status>Enabled</vlan-status>
            <vlan-owner>static</vlan-owner>
            <vlan-tag>0</vlan-tag>
            <vlan-index>3</vlan-index>
            <vlan-l3-interface>vlan.11 (UP)</vlan-l3-interface>
            <vlan-l3-interface-address>10.8.27.1/24</vlan-l3-interface-address>
            <vlan-protocol-port>Port Mode</vlan-protocol-port>
            <vlan-members-count>12</vlan-members-count>
            <vlan-members-upcount>2</vlan-members-upcount>
        </vlan>
    </vlan-information>
</rpc-reply>

由此,我只希望将<vlan-name><vlan-l3-interface-address>标签进行解析并保存在dict / json之类的变量中,其格式为:

{'Vlan-Name' : vlan_name, 'Interface-Address' : interface_addr}

,然后为dicts / json列表中的每个元素添加这些dict / json。 这是我用于解析和插入列表中的json的代码:

root = tree.getroot()
nw_pool = []
nw_json = {}
for child in root:
    for items in child:
        for item1 in items:
            if 'vlan-l3-interface-address' in item1.tag:
                interface_addr = item1.text
                nw_json['Interface-Address'] = interface_addr
            elif 'vlan-name' in item1.tag:
                vlan_name = item1.text
                nw_json['Vlan-Name'] = vlan_name
                nw_pool.append(nw_json)
print(nw_pool)

但是当我打印nw_pool时,它给了我一个输出,在该输出中重复找到的最后一个元素的json,而不是给我每个元素不同的格言。

输出:

[{'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}, {'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}]

我想要的输出是:

[{'Vlan-Name': 'ACRS-Dev2', 'Interface-Address': '10.8.25.1/24'}, {'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}] 

有人可以帮我吗?预先感谢。

2 个答案:

答案 0 :(得分:1)

您正在覆盖现有字典,而每次迭代都需要一个新字典。因此,您需要将nw_json = {}放在另一个位置:

root = tree.getroot()
nw_pool = []
for child in root:
    for items in child:
        nw_json = {}   # Work with new dict
        for item1 in items:
            if 'vlan-l3-interface-address' in item1.tag:
                interface_addr = item1.text
                nw_json['Interface-Address'] = interface_addr
            elif 'vlan-name' in item1.tag:
                vlan_name = item1.text
                nw_json['Vlan-Name'] = vlan_name
                nw_pool.append(nw_json)
print(nw_pool)

答案 1 :(得分:1)

代码中的问题是您在循环之前启动了dict()对象,因此数据已在流中被覆盖。

@Hoenie's answer可以使您清楚地知道自己的错误。

此外,我建议您尝试使用BeautifulSoup解析XML,因为它简单易懂。尝试下面的代码。

from bs4 import BeautifulSoup

fileObj = open('test.xml').read()
soup = BeautifulSoup(fileObj, 'lxml')
vlans = soup.findAll('vlan')
nw_pool = []
for vlan in vlans:
    nw_json = dict()
    nw_json['Interface-Address'] = vlan.find('vlan-l3-interface-address').text
    nw_json['Vlan-Names'] = vlan.find('vlan-name').text
    nw_pool.append(nw_json)
print(nw_pool) # O/P [{'Interface-Address': '10.8.25.1/24', 'Vlan-Names': 'ACRS-Dev2'}, {'Interface-Address': '10.8.27.1/24', 'Vlan-Names': 'default'}]

干杯!