遍历xml并仅获取选择项

时间:2020-03-26 21:37:35

标签: xml python-2.7

我正在查询一个返回包含大量元素的XML输出的系统,但是我很难仅获取某些项目。

当我仅基于标签host_sn进行搜索时,它仅添加名称为host_sn且快照仅包含第0代的快照,而存在许多代。下面的xml输出以及示例生成0和1。

我如何遍历子元素并将所有世代和快照名称放入字典中。下面是我想要的输出示例。

我想要的输出:


    #There should only be generation 0,1,2,3,etc.. never will see two gen 0,1,2
    [{'generation': '0', 'timestamp': 'Thu Mar 26 22:10:55 2020', 'snapshot_link': 'No', 'snapshot_name': 'host_sn'}]
    [{'generation': '1', 'timestamp': 'Thu Mar 26 22:20:55 2020', 'snapshot_link': 'No', 'snapshot_name': 'host_sn'}]
    [{'generation': '2', 'timestamp': 'Thu Mar 26 22:30:55 2020', 'snapshot_link': 'No', 'snapshot_name': 'host_sn'}]

我只将第0代附加到字典中的代码(我玩过这个代码,并且会添加所有项,而不是像上面期望的输出那样添加唯一项:

snapshot = []

    #Using python Elementree, run_to_xml just adds -output xml_element in the command
    #example cmd: snapcmd -sg host_sg list -detail -output xml_element -sid 6161
    snap_xml = self.run_to_xml("snapcmd -sg " + source_sg + " list -detail", True)

    if snap_xml is not None:
        for sn_item in snap_xml.findall('SG/Snapvx/Snapshot'):
            sn_name = sn_item.find('snapshot_name').text
            sn_timestamp = sn_item.find('timestamp').text
            sn_generation = sn_item.find('generation').text
            sn_link = sn_item.find('link').text

            sn_list = {}
                if sn_name.endswith(SNAPSHOTVX_NAME_POSTFIX): #Postfix is _sn
                    if sn_name not in [sn_list['snapshot_name'] for sn_list in snapshot]:
                        sn_list['generation'] = sn_generation
                        sn_list['snapshot_name'] = sn_name
                        sn_list['timestamp'] = sn_timestamp
                        sn_list['snapshot_link'] = sn_link
                        snapshot.append(sn_list)

XML输出示例:


    <?xml version="1.0" standalone="yes" ?>
    <SymCLI_ML>
      <SG>
        <SG_Info>
          <name>host_sn</name>
          <symid>0001##00####</symid>
          <microcode_version>6161</microcode_version>
        </SG_Info>
        <Snapvx>
          <Snapshot>
            <source>000F9</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 16:05:37 2020</timestamp>
            <generation>0</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>34</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>268</total_deltas_tracks>
            <non_shared_mb>10</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>76</non_shared_tracks>
            <expiration_date>Fri Mar 27 16:05:37 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000F9</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 15:53:39 2020</timestamp>
            <generation>1</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>45</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>361</total_deltas_tracks>
            <non_shared_mb>21</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>169</non_shared_tracks>
            <expiration_date>Fri Mar 27 15:53:39 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000FA</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 16:05:37 2020</timestamp>
            <generation>0</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>7</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>53</total_deltas_tracks>
            <non_shared_mb>3</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>21</non_shared_tracks>
            <expiration_date>Fri Mar 27 16:05:37 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000FA</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 15:53:39 2020</timestamp>
            <generation>1</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>8</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>61</total_deltas_tracks>
            <non_shared_mb>4</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>29</non_shared_tracks>
            <expiration_date>Fri Mar 27 15:53:39 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000FB</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 16:05:37 2020</timestamp>
            <generation>0</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>0</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>3</total_deltas_tracks>
            <non_shared_mb>0</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>1</non_shared_tracks>
            <expiration_date>Fri Mar 27 16:05:37 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000FB</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 15:53:39 2020</timestamp>
            <generation>1</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>0</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>3</total_deltas_tracks>
            <non_shared_mb>0</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>1</non_shared_tracks>
            <expiration_date>Fri Mar 27 15:53:39 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000FC</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 16:05:37 2020</timestamp>
            <generation>0</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>20</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>163</total_deltas_tracks>
            <non_shared_mb>10</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>78</non_shared_tracks>
            <expiration_date>Fri Mar 27 16:05:37 2020</expiration_date>
          </Snapshot>
          <Snapshot>
            <source>000FC</source>
            <snapshot_name>host_sn</snapshot_name>
            <timestamp>Thu Mar 26 15:53:39 2020</timestamp>
            <generation>1</generation>
            <link>No</link>
            <restore>No</restore>
            <failed>No</failed>
            <GCM>False</GCM>
            <zDP>False</zDP>
            <total_deltas_mb>25</total_deltas_mb>
            <total_deltas_gb>0.0</total_deltas_gb>
            <total_deltas_tb>0.00</total_deltas_tb>
            <total_deltas_tracks>198</total_deltas_tracks>
            <non_shared_mb>14</non_shared_mb>
            <non_shared_gb>0.0</non_shared_gb>
            <non_shared_tb>0.00</non_shared_tb>
            <non_shared_tracks>113</non_shared_tracks>
            <expiration_date>Fri Mar 27 15:53:39 2020</expiration_date>
          </Snapshot>
        </Snapvx>
        <Snapvx_Totals>
          <total_deltas_mb>145698</total_deltas_mb>
          <total_deltas_gb>142.3</total_deltas_gb>
          <total_deltas_tb>0.14</total_deltas_tb>
          <total_deltas_tracks>1165587</total_deltas_tracks>
          <non_shared_mb>362</non_shared_mb>
          <non_shared_gb>0.4</non_shared_gb>
          <non_shared_tb>0.00</non_shared_tb>
          <non_shared_tracks>2893</non_shared_tracks>
        </Snapvx_Totals>
      </SG>
    </SymCLI_ML>

1 个答案:

答案 0 :(得分:0)

您可以使用lxml到达那里。请注意,您的xml仍然无效(缺少结束符<Snapvx>,并且其中没有snapshot_link。但是通常:

generations = """[your xml above, fixed]"""
from lxml import etree
doc = etree.fromstring(generations)
targets = doc.xpath('//Snapshot')
rows = []
for target in targets:
    items = {}
    gen = target.xpath('generation')[0]
    ts = target.xpath('timestamp')[0]
    sn = target.xpath('snapshot_name')[0]
    items[gen.tag] = gen.text
    items[ts.tag] = ts.text
    items[sn.tag] = sn.text
    if items not in rows:
       rows.append(items)
for row in rows:
    print(row)

输出:

{'generation': '0', 'timestamp': 'Thu Mar 26 16:05:37 2020', 'snapshot_name': 'host_sn'}
{'generation': '1', 'timestamp': 'Thu Mar 26 15:53:39 2020', 'snapshot_name': 'host_sn'}
{'generation': '2', 'timestamp': 'Thu Mar 26 15:53:39 2020', 'snapshot_name': 'host_sn'}
{'generation': '2', 'timestamp': 'Thu Mar 26 16:05:37 2020', 'snapshot_name': 'host_sn'}
{'generation': '3', 'timestamp': 'Thu Mar 26 16:05:37 2020', 'snapshot_name': 'host_sn'}