Python XML迭代多个块

时间:2013-05-30 00:13:17

标签: python xml parsing elementtree

我有一个python XML解析问题,我似乎无法弄清楚。

我有以下XML:

<data>
  <data_in base="base64">
  </data_in>
  <log_sense_data>
    <ds base="bool">1</ds>
    <spf base="bool">0</spf>
    <page_code base="hex">15</page_code>
    <background_scan_results_log_page>
      <parameter>
        <parameter_code base="hex">0000</parameter_code>
        <du base="bool">0</du>
        <tsd base="bool">0</tsd>
        <etc base="bool">0</etc>
        <tmc base="hex">00</tmc>
        <format_linking base="hex">03</format_linking>
        <parameter_length base="dec">12</parameter_length>
        <description base="string">background scanning status parameter</description>
        <accumulated_power_on_minutes base="dec">579578</accumulated_power_on_minutes>
        <background_scanning_status base="hex">01</background_scanning_status>
        <number_of_background_scans_performed base="dec">112</number_of_background_scans_performed>
        <background_scan_progress base="hex">00000036</background_scan_progress>
        <number_of_background_medium_scans_performed base="dec">112</number_of_background_medium_scans_performed>
      </parameter>
      <parameter>
        <parameter_code base="hex">0001</parameter_code>
        <du base="bool">0</du>
        <tsd base="bool">0</tsd>
        <etc base="bool">0</etc>
        <tmc base="hex">00</tmc>
        <format_linking base="hex">03</format_linking>
        <parameter_length base="dec">20</parameter_length>
        <description base="string">background medium scan parameter</description>
        <accumulated_power_on_minutes base="dec">82932</accumulated_power_on_minutes>
        <reassign_status base="hex">05</reassign_status>
        <sense_key base="hex">01</sense_key>
        <additional_sense_code base="hex">17</additional_sense_code>
        <additional_sense_code_qualifier base="hex">01</additional_sense_code_qualifier>
        <vendor_specific base="hex">20e2570187</vendor_specific>
        <logical_block_address base="hex">00000000478994d8</logical_block_address>
      </parameter>
      <parameter>
        <parameter_code base="hex">0002</parameter_code>
        <du base="bool">0</du>
        <tsd base="bool">0</tsd>
        <etc base="bool">0</etc>
        <tmc base="hex">00</tmc>
        <format_linking base="hex">03</format_linking>
        <parameter_length base="dec">20</parameter_length>
        <description base="string">background medium scan parameter</description>
        <accumulated_power_on_minutes base="dec">104467</accumulated_power_on_minutes>
        <reassign_status base="hex">05</reassign_status>
        <sense_key base="hex">01</sense_key>
        <additional_sense_code base="hex">18</additional_sense_code>
        <additional_sense_code_qualifier base="hex">07</additional_sense_code_qualifier>
        <vendor_specific base="hex">203ab846ea</vendor_specific>
        <logical_block_address base="hex">00000000133d5046</logical_block_address>
      </parameter>
    </background_scan_results_log_page>
  </log_sense_data>
</data>

其中Parameter_code 0000将始终存在,之后可能有任意数量的parameter_codes。基本上我想从parameter_code 0000中提取2个值(开机分钟,后台扫描),以及来自parameter_code 0001和更大的大多数值,以便稍后放入数据库。我到目前为止的代码是:

import xml.etree.ElementTree as et
log_page_tree = et.fromstring(results['Data']['RawData'])
if log_page_tree.find('log_sense_data') == None:
        continue
    else:
        for element in log_page_tree.find('log_sense_data'):
            for pagecode in element.iter('page_code'):
                if pagecode.text == '15':
                    for param in log_page_tree.find('log_sense_data').find('background_scan_results_log_page'):
                        for derp in param.iter():
                            print derp.tag, derp.text
                #for totalpoweron in param.iter('accumulated_power_on_minutes'):
                                    #print totalpoweron.text

我希望能够保留parameter_code 0000中的2个值,同时迭代其余的parameter_codes以放入数据库。任何人都可以在这里给我一个正确的方向吗?如果我指定param.iter('somevalue')来获取每个值,则代码似乎不会迭代。

1 个答案:

答案 0 :(得分:0)

好的,虽然有一些方法可以简化/改进你的代码,但听起来你很高兴在这里:

for param in log_page_tree.find('log_sense_data').find('background_scan_results_log_page'):

这实际上会迭代每个parameter

但是现在你要打开parameter_code是否0000,在每种情况下做不同的事情。所以:

converters = {
    'hex': lambda s: int(s, 16)
    'dec': int,
    'bool': bool
}

if param.find('parameter_code').text == '0000':
    accumulated_power_on_minutes = int(param.find('accumulated_power_on_minutes').text)
    number_of_background_scans_performed = int(param.find('number_of_background_scans_performed').text)
else:
    obj = {}
    for elem in param.getchildren():
        name = elem.tag
        base = elem.attrib['base']
        converter = converters.get(base, lambda x: x)
        value = convert(elem.text)
        obj[name] = value
    # do something with obj