在python xmltodict,elementTree等中将一个XML转换为另一个XML文件的最有效方法

时间:2015-08-06 20:17:12

标签: python xml elementtree xmltodict

Howdie do,

所以我有以下两个XML文件。

档案A:

<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
    <Shipments>
        <Shipment>
            <Container>
                <OrderNumber>5108046</OrderNumber>
                <ContainerNumber>5108046_1</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-12T12:00:00</ShipDate>
                <CarrierName>UPS</CarrierName>
                <TrackingNumber>1ZX20520A803682850</TrackingNumber>
                <StatusCode>InTransit</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T13:53:18</TimeStamp>
                        <City></City>
                        <StateOrProvince></StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T18:47:44</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>Status: AF Recorded</Description>
                        <TrackingStatus>In Transit</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
        <Shipment>
            <Container>
                <OrderNumber>456789</OrderNumber>
                <ContainerNumber>44789</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-03T13:56:27</ShipDate>
                <CarrierName>UP2</CarrierName>
                <TrackingNumber>1Z4561230020</TrackingNumber>
                <StatusCode>IN_TRANSIT</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-07-03T13:56:27</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
    </Shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId></RequestId>
    <RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>

档案B:

<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
    <getShipmentStatusResult>
        <outcome>
            <result>Success</result>
            <error></error>
        </outcome>
        <shipments>
            <shipment>
                <orderID>123456</orderID>
                <containerNo>CD1863663C</containerNo>
                <shipDate>2015-06-29T18:47:44</shipDate>
                <carrier>UPS</carrier>
                <trackingNumber>1Z4561230001</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T13:53:18</timeStamp>
                        <city />
                        <state />
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T18:47:44</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Shipped from warehouse</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
            <shipment>
                <orderID>456789</orderID>
                <containerNo>44789</containerNo>
                <shipDate>2015-07-03T13:56:27</shipDate>
                <carrier>UP2</carrier>
                <trackingNumber>1Z4561230020</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-07-03T13:56:27</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
        </shipments>
        <matchingRecords>2</matchingRecords>
        <requestId></requestId>
        <remainingRecords>0</remainingRecords>
    </getShipmentStatusResult>
</getShipmentStatusResponse>

我基本上需要阅读文件A并将其更改为文件B.现在,我一直在使用xmltodic来解析文件A,但它只会读取顶部元素。似乎我必须创建多个for循环才能用xmltodict实现这一点。循环遍历每个父级,然后是childern元素。

看看elementree,这看起来是一样的。有没有人知道任何其他方法来做到这一点,而不必做多个for循环?

1 个答案:

答案 0 :(得分:2)

由于您的输出或多或少是输入的精确映射 - 只有元素名称似乎不同,我建议您使用XSLT以声明方式进行转换。

假设每个输入元素名称无条件地映射到一个输出元素名称(这就是它的样子,根据您的样本判断):这是一个XSLT 1.0转换,可以帮助您入门(如何在Python中使用XSLT的基本指令)找到in this answer):

<xsl:transform version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:my="http://tempuri.org/config"
  exclude-result-prefixes="my"
>
  <xsl:output method="xml" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*" />

  <my:config>
    <nameMap from="Shipments" to="shipments" />
    <nameMap from="Shipment" to="shipment" />
    <nameMap from="Container" to="-" />
  </my:config>
  <xsl:variable name="nameMap" select="document('')/*/my:config/nameMap" />

  <xsl:template match="node() | @*" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <getShipmentStatusResponse>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResponse>
  </xsl:template>

  <xsl:template match="GetShipmentUpdatesResult">
    <getShipmentStatusResult>
      <outcome>
        <result>Success</result>
        <error></error>
      </outcome>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResult>
  </xsl:template>

  <xsl:template match="*">
    <xsl:variable name="map" select="$nameMap[@from = name(current())]" />
    <xsl:choose>
      <xsl:when test="$map/@to = '-'">
        <xsl:apply-templates select="@* | node()" />
      </xsl:when>
      <xsl:when test="$map/@to != ''">
        <xsl:element name="{$map/@to}">
          <xsl:apply-templates select="@* | node()" />
        </xsl:element>
      </xsl:when>
      <xsl:when test="$map/@to = ''" />
      <xsl:otherwise>
        <xsl:call-template name="identity" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:transform>

转型解决了以下问题:

  • 其核心是身份转换正在发挥作用:任何与专用模板不匹配的节点都将按原样复制到输出中。
  • 它包含一个就地配置部分(<my:config>),您可以在其中放置<nameMap>个元素,用于将输入名称映射到输出名称。这通过以下约定(在<xsl:template match="*">几行中实现):

    • 如果输入元素与任何@from匹配且填充了@to,则该元素将被重命名并且其子项将被处理
    • 如果输入元素与@from匹配且@to为'-',则该元素将被删除,但其子元素仍将被处理。
    • 如果输入元素与任何@from匹配且@to为空,则它将完全从输出中删除
    • 在所有其他情况下,输入元素将通过身份模板1:1复制。

目前输出看起来像这样。添加更多<nameMap>规则以定义其余输入元素的行为。

<getShipmentStatusResponse>
  <getShipmentStatusResult>
    <outcome>
      <result>Success</result>
      <error />
    </outcome>
    <shipments>
      <shipment>
        <OrderNumber>5108046</OrderNumber>
        <ContainerNumber>5108046_1</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-12T12:00:00</ShipDate>
        <CarrierName>UPS</CarrierName>
        <TrackingNumber>1ZX20520A803682850</TrackingNumber>
        <StatusCode>InTransit</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-06-29T13:53:18</TimeStamp>
            <City />
            <StateOrProvince />
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
          <TrackingEvent>
            <TimeStamp>2015-06-29T18:47:44</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>Status: AF Recorded</Description>
            <TrackingStatus>In Transit</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
      <shipment>
        <OrderNumber>456789</OrderNumber>
        <ContainerNumber>44789</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-03T13:56:27</ShipDate>
        <CarrierName>UP2</CarrierName>
        <TrackingNumber>1Z4561230020</TrackingNumber>
        <StatusCode>IN_TRANSIT</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-07-03T13:56:27</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
    </shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId />
    <RecordsRemaining>0</RecordsRemaining>
  </getShipmentStatusResult>
</getShipmentStatusResponse>