从python中的XML文档中提取特定数据

时间:2019-03-07 13:36:16

标签: python xml

我的XML文档的部分

<?xml version="1.0"?>
<orderDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://plmpack.com/stackbuilder/StackBuilderXMLExport.xsd">
  <author>cmj</author>
  <date>2019-02-14T10:45:48.4872033+01:00</date>
  <unit>mm|kg</unit>
  <orderType>
    <orderNumber>Analysis0</orderNumber>
    <loadSpace>
  <id>1</id>
  <name>Pallet0</name>
  <length>1200</length>
  <width>800</width>
  <maxLoadHeight>1500</maxLoadHeight>
  <maxLoadWeight>0</maxLoadWeight>
  <baseHeight>144</baseHeight>
  <maxLengthOverhang>0</maxLengthOverhang>
  <maxWidthOverhang>0</maxWidthOverhang>
</loadSpace>
<item>
  <id>1</id>
  <name>paper0</name>
  <length>320</length>
  <width>260</width>
  <height>120</height>
  <weight>5</weight>
  <maxWeightOnTop>0</maxWeightOnTop>
  <permittedOrientations>001</permittedOrientations>
</item>
<orderLine>
  <itemId>1</itemId>
  <quantity>110</quantity>
</orderLine>
<load>
  <loadSpaceId>1</loadSpaceId>
  <statistics>
    <loadVolume>1098240000</loadVolume>
    <volumeUtilization>84.365781710914447</volumeUtilization>
    <loadWeight>550</loadWeight>
    <weightUtilization>INF</weightUtilization>
    <loadHeight>1320</loadHeight>
    <cOfG>
      <x>0</x>
      <y>0</y>
      <z>0</z>
    </cOfG>
  </statistics>
  <placement>
    <itemId>1</itemId>
    <x>20</x>
    <y>10</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
  </placement>
  <placement>
    <itemId>1</itemId>
    <x>20</x>
    <y>270</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
  </placement>
  <placement>
    <itemId>1</itemId>
    <x>20</x>
    <y>530</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
  </placement>
  <placement>
    <itemId>1</itemId>
    <x>340</x>
    <y>10</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
   </placement>
  </load>     
 </orderType>      
</orderDocument>     

到目前为止我得到的代码

import os
import xml.etree.ElementTree as ET

from xml.etree.ElementTree import ElementTree

base_path = os.path.dirname(os.path.realpath(__file__))

xml_file = os.path.join(base_path, "first_try_palletizing.xml")

tree = ET.parse(xml_file)

root = tree.getroot()

该程序用于码垛机器人手臂。 XML数据来自一个程序,该程序计算出堆栈对象的最佳方法。我需要提取“位置”数据(x,y,z,L,W),以便将其输入到机器人程序中。我是Python的新手,所以假设我什么都不知道。

我已经尝试过下面的代码,但是我无法深入了解以下内容:(orderNumber,loadSpace,item,orderLine,load)。

for child in root:
    for element in child:
        print(element)

对不起,有点混乱,但这是我第一次使用stackoverflow。

1 个答案:

答案 0 :(得分:0)

下面的代码是bypassing the namespaces,并寻找“ placement”元素

import xml.etree.ElementTree as ET
from StringIO import StringIO

xml = '''<?xml version="1.0"?>
<orderDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://plmpack.com/stackbuilder/StackBuilderXMLExport.xsd">
  <author>cmj</author>
  <date>2019-02-14T10:45:48.4872033+01:00</date>
  <unit>mm|kg</unit>
  <orderType>
    <orderNumber>Analysis0</orderNumber>
    <loadSpace>
  <id>1</id>
  <name>Pallet0</name>
  <length>1200</length>
  <width>800</width>
  <maxLoadHeight>1500</maxLoadHeight>
  <maxLoadWeight>0</maxLoadWeight>
  <baseHeight>144</baseHeight>
  <maxLengthOverhang>0</maxLengthOverhang>
  <maxWidthOverhang>0</maxWidthOverhang>
</loadSpace>
<item>
  <id>1</id>
  <name>paper0</name>
  <length>320</length>
  <width>260</width>
  <height>120</height>
  <weight>5</weight>
  <maxWeightOnTop>0</maxWeightOnTop>
  <permittedOrientations>001</permittedOrientations>
</item>
<orderLine>
  <itemId>1</itemId>
  <quantity>110</quantity>
</orderLine>
<load>
  <loadSpaceId>1</loadSpaceId>
  <statistics>
    <loadVolume>1098240000</loadVolume>
    <volumeUtilization>84.365781710914447</volumeUtilization>
    <loadWeight>550</loadWeight>
    <weightUtilization>INF</weightUtilization>
    <loadHeight>1320</loadHeight>
    <cOfG>
      <x>0</x>
      <y>0</y>
      <z>0</z>
    </cOfG>
  </statistics>
  <placement>
    <itemId>1</itemId>
    <x>20</x>
    <y>10</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
  </placement>
  <placement>
    <itemId>1</itemId>
    <x>20</x>
    <y>270</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
  </placement>
  <placement>
    <itemId>1</itemId>
    <x>20</x>
    <y>530</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
  </placement>
  <placement>
    <itemId>1</itemId>
    <x>340</x>
    <y>10</y>
    <z>144</z>
    <L>XP</L>
    <W>YP</W>
   </placement>
  </load>     
 </orderType>      
</orderDocument> '''

placements_data = []

it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
root = it.root
placements = root.findall('.//placement')
for idx, placement in enumerate(placements):
    print('placement # {}'.format(idx))
    for i in range(1, 6):
        child = placement.getchildren()[i]
        print('\t{} - {}'.format(child.tag, child.text))

输出

placement # 0
    x - 20
    y - 10
    z - 144
    L - XP
    W - YP
placement # 1
    x - 20
    y - 270
    z - 144
    L - XP
    W - YP
placement # 2
    x - 20
    y - 530
    z - 144
    L - XP
    W - YP
placement # 3
    x - 340
    y - 10
    z - 144
    L - XP
    W - YP