在python中解析嵌套的xml

时间:2015-01-07 07:41:22

标签: python xml parsing xml-parsing

我有这个XML文件:

<?xml version="1.0" ?><XMLSchemaPalletLoadTechData xmlns="http://tempuri.org/XMLSchemaPalletLoadTechData.xsd">
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>sample</MeasurementType>
  </TechDataParams>
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>XRF</MeasurementType>
  </TechDataParams>
</XMLSchemaPalletLoadTechData>

这是我解析xml的代码:

for data in xml.getElementsByTagName('TechDataParams'):
    #parse xml
    runnum=data.getElementsByTagName('RunNumber')[0].firstChild.nodeValue
    hold=data.getElementsByTagName('Holder')[0].firstChild.nodeValue
    processtn=data.getElementsByTagName('ProcessToolName'[0].firstChild.nodeValue)
    recipedata=data.getElementsByTagName('RecipeName'[0].firstChild.nodeValue)
    palletna=data.getElementsByTagName('PalletName')[0].firstChild.nodeValue
    palletposi=data.getElementsByTagName('PalletPosition')[0].firstChild.nodeValue
    control = data.getElementsByTagName('IsControl')[0].firstChild.nodeValue
    loadpos=data.getElementsByTagName('LoadPosition')[0].firstChild.nodeValue
    holderjob=data.getElementsByTagName('HolderJob')[0].firstChild.nodeValue
    spc = data.getElementsByTagName('IsSPC')[0].firstChild.nodeValue
    mestype = data.getElementsByTagName('MeasurementType')[0].firstChild.nodeValue

但是当我打印每个节点时,我只得到一组'TechDataParams',但我希望能够从XML中获取所有'TechDataParams'。

如果我的问题有点不清楚,请告诉我。

3 个答案:

答案 0 :(得分:1)

请不要使用minidom深入分析XML,除非您希望自己将头发拉出来。

我会在这里使用xmltodict module。一行,你有一个包含你需要的所有数据的词典列表:

import xmltodict

data = """your xml here"""

data = xmltodict.parse(data)['XMLSchemaPalletLoadTechData']['TechDataParams']
for params in data:
    print dict(params)

打印:

{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'sample', u'Holder': u'sample', u'IsSPC': u'sample'}
{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'XRF', u'Holder': u'sample', u'IsSPC': u'sample'}

答案 1 :(得分:0)

以下是您的示例。将file_path替换为您自己的。{/ p>

我将RunNumber的值替换为001002

# -*- coding: utf-8 -*-
#!/usr/bin/python

from xml.dom import minidom

file_path = 'C:\\temp\\test.xml'

doc = minidom.parse(file_path)
TechDataParams = doc.getElementsByTagName('TechDataParams')
for t in TechDataParams:
    num = t.getElementsByTagName('RunNumber')[0]
    print 'num is ', num.firstChild.data

输出:

num is  001
num is  002

答案 2 :(得分:0)

同样由lxml.etree模块。

  1. 输入包含名称空间,即http://tempuri.org/XMLSchemaPalletLoadTechData.xsd
  2. 使用xpath方法查找目标TechDataParams代码。
  3. 获取TechDataParams代码的子项,并创建keytag namevaluetext of tag的字典。
  4. 附加到列表变量TechDataParams
  5. 代码:

    from lxml import etree
    root = etree.fromstring(content)
    TechDataParams_info = []
    for  i in root.xpath("//a:XMLSchemaPalletLoadTechData/a:TechDataParams", namespaces={"a": 'http://tempuri.org/XMLSchemaPalletLoadTechData.xsd'}):
        temp = dict()
        for j in i.getchildren():
            temp[j.tag.split("}", 1)[-1]] = j.text
        TechDataParams_info.append(temp)
    
    print TechDataParams_info
    

    输出:

    [{'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'sample', 'Holder': 'sample', 'IsSPC': 'sample'}, {'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'XRF', 'Holder': 'sample', 'IsSPC': 'sample'}]