需要帮助来解析XML文件

时间:2017-10-18 09:20:39

标签: python xml-parsing elementtree

我试图解析一个XML文件而且我被封锁了。

快速查看我的XML文件:

<editrust>
  <flux ref='ITFR2006' sens='IN'>
    <intervalle ref='H10'>
      <terminé>1</terminé>
      <prisEnComtpe>1</prisEnComtpe>
    </intervalle>
    <intervalle ref='H60'>
      <terminé>11</terminé>
      <prisEnComtpe>11</prisEnComtpe>
    </intervalle>
    <intervalle ref='D1'>
      <terminé>150</terminé>
      <prisEnComtpe>150</prisEnComtpe>
    </intervalle>
    <intervalle ref='D2'>
      <terminé>150</terminé>
      <prisEnComtpe>150</prisEnComtpe>
    </intervalle>
  </flux>

  <flux ref='ITFR2007_2021' sens='IN'>
    <intervalle ref='H10'>
      <terminé>2</terminé>
      <prisEnComtpe>2</prisEnComtpe>
    </intervalle>
    <intervalle ref='H60'>
      <terminé>181</terminé>
      <prisEnComtpe>121</prisEnComtpe>
    </intervalle>
    <intervalle ref='D1'>
      <terminé>600</terminé>
      <prisEnComtpe>600</prisEnComtpe>
    </intervalle>
    <intervalle ref='D2'>
      <terminé>600</terminé>
      <prisEnComtpe>600</prisEnComtpe>
    </intervalle>
  </flux>
...

我想渲染类似字典列表的内容

{'ITFR2006': ['IN', 'H10', '1','1', 'H60', '11', '11', 'D1', '150', '150'],...

我做了一个剧本:

import xml.etree.ElementTree as etree
tree = etree.parse('fichier.xml')
root = tree.getroot()

flux = {}

def findText(node):

    for child in node:

        if child.attrib.get("ref"):

            if "ITFR" in child.attrib.get("ref"):
                itfr = child.attrib.get("ref")
                flux[itfr] = []

                print("\n-----------------\n")

            print(child.attrib.get("ref"))

        if child.attrib.get("sens"):
            flux[itfr].append(child.attrib.get("sens"))
            print(child.attrib.get("sens"))

        if child.text.strip():

            print(child.text.strip())

        findText(child)


findText(root)

print(flux)

脚本有这个渲染:

-----------------

ITFR2006
IN
H10
1
1
H60
11
11
D1
150
150
D2
150
150

-----------------

ITFR2007_2021
IN
H10
2
2
H60
181
121
D1
600
600
D2
600
600
....

所以,print(flux)会:

{'ITFR2006': ['IN'], 'ITFR2007_2021': ['IN'], 'ITFR2008': ['IN'], 'ITFR2011_2020': ['IN'], 'ITFR2012': ['OUT'], 'ITFR2013': ['OUT'], 'ITFR2014': ['OUT'], 'ITFR2017': ['OUT'], 'ITFR2018': ['OUT'], 'ITFR2019': ['OUT'], 'ITFR2023': ['OUT'], 'ITFR2024': ['OUT']}

我认为这是一个很好的开始但我无法用其他值填充我的列表(&#39; H10&#39;,&#39; 1&#39;,&#39; 1&#39 ;,&#39; H60&#39;,...)

有什么想完成我的工作吗?

由于

1 个答案:

答案 0 :(得分:1)

这是一种方法(使用Python 3.6测试):

import xml.etree.ElementTree as etree
import pprint

tree = etree.parse('fichier.xml')
fluxdict = {}

for flux in tree.findall("flux"):
    # The key
    key = flux.get("ref")
    # Add first item to the list
    val = [flux.get("sens")]

    for intervalle in flux.findall("intervalle"):
        ref = intervalle.get("ref")
        termine = intervalle.findtext("terminé")
        prisEnComtpe = intervalle.findtext("prisEnComtpe")

        # Add items by extending list
        val.extend([ref, termine, prisEnComtpe])

    # Add key:val pair for this 'flux'
    fluxdict[key] = val

pprint.pprint(fluxdict)

输出:

{'ITFR2006': ['IN',
              'H10',
              '1',
              '1',
              'H60',
              '11',
              '11',
              'D1',
              '150',
              '150',
              'D2',
              '150',
              '150'],
 'ITFR2007_2021': ['IN',
                   'H10',
                   '2',
                   '2',
                   'H60',
                   '181',
                   '121',
                   'D1',
                   '600',
                   '600',
                   'D2',
                   '600',
                   '600']}