使用Python获取元素及其子代的所有属性

时间:2014-01-07 10:15:02

标签: python xml nested nodes

我正在尝试将某种XML(现在使用xml.etree.ElementTree)执行到JSON(并研究Python,在我真正的非关键任务中使用它)。 XML示例:

<ExportData name="ExportData" hwId="0120">
  <input name="Ethernet" type="Ethernet" id="100" numTs="0" ... />
  <input name="ASI" type="ASI" id="0" numTs="1" ... >
    <setup name="ASI Input 1" id="1" description="ASI" tsSync="no" currentlyMonitored="true" ... />
  </input>
  <input name="FD1" type="FD" id="1" numTs="1" ... >
    <setup name="NewPreset1" id="1" description="642 MHz" ... />
  </input>
  <input name="FD2" type="FD" id="2" numTs="0" ... />
</ExportData>

我当前的任务是:对于所有具有名称“setup”的子节点的“输入”节点,获取公共(连接)名称和ID(例如上面:name =“ASI:ASI Input 1”和id = “0:1”),然后获取两个节点的所有属性 - 当前和子节点,除了名称和id(例如上面:numTs,description,tsSync,...)

我有很多“googled”代码示例,基于不同的主体(xpath,if / for root.childNodes等),现在我可以从父节点或子节点之一(以不同方式)提取属性),但我钢铁不能得到所有这些......

然后,我需要在JSON中打印解析数据,如下所示:

{
 "data":[
  { "{#INPUTID}":"0:1", "{#INPUTNAME}":"ASI:ASI Input 1", "{#INPUTPARAM}":"numTs"       },
  { "{#INPUTID}":"0:1", "{#INPUTNAME}":"ASI:ASI Input 1", "{#INPUTPARAM}":"tsSync"      },
  { "{#INPUTID}":"1:1", "{#INPUTNAME}":"FD1:NewPreset1", "{#INPUTPARAM}":"description" } 
  ...
 ]
}

(JSON是人类可读的,对于任何有效的JSON来说都足够了。)

如何以gracefull python方式解决我的任务? (使用整洁的算法和正确的错误和异常处理?)。先谢谢!

UPD 我的进展:

ExportData = ET.fromstring(xml)

# First, create empty Output Dict by Template
# It will be implemented with needet data later
outData = { 'data': [] }

# Then I create 2 Dicts, for node & subnode, if subnode consists 
# necessery pattern
# All further manipulations will bi done with this Dicts
for input in ExportData.findall('input'):
  if input.find('tuningSetup') is not None:
    inputParams = input.attrib
    setupParams = input.find('tuningSetup').attrib
    inputId = inputParams['id'] + ':' + setupParams['id']
    inputName = inputParams['name'] + ':' + setupParams['name']
    del inputParams['name'], inputParams['id'] #, inputParams['numTs']
    del setupParams['name'], setupParams['id'] #, setupParams['numTs']
    commonParams = dict(inputParams.items() + setupParams.items())
    for param, value in commonParams.iteritems():
      outData['data'].append({ '{#INPUTID}': inputId, '{#INPUTNAME}': inputName, '{#INPUTPARAM}': param}
)

# Finally, dumping data to json
print json.dumps(outData, sort_keys=True, indent=2)

1 个答案:

答案 0 :(得分:0)

这里有一些代码可以帮助您入门:

s = """<ExportData name="ExportData" hwId="0120">
  <input name="Ethernet" type="Ethernet" id="100" numTs="0" />
  <input name="ASI" type="ASI" id="0" numTs="1" >
    <setup name="ASI Input 1" id="1" description="ASI" tsSync="no" currentlyMonitored="true" />
  </input>
  <input name="FD1" type="FD" id="1" numTs="1" >
    <setup name="NewPreset1" id="1" description="642 MHz" />
  </input>
  <input name="FD2" type="FD" id="2" numTs="0" />
</ExportData>"""

tree = ET.fromstring(s)
for node in tree.iter('input'):
  child = next((c for c in node if c.tag == 'setup'), None)
  if child is None: 
    continue
  else:
    print node, child

这导致以下输出:

<Element 'input' at 0x1047541d0> <Element 'setup' at 0x104754210>
<Element 'input' at 0x104754250> <Element 'setup' at 0x104754290>

这将为您提供父节点和子节点。从那里,您可以使用node.attribchild.attrib轻松获取其属性。然后,只需将您想要组合在一起的属性组合在一起来格式化其余属性。