在属性ID与Python相同的任何位置合并XML文件

时间:2014-11-26 22:20:45

标签: python xml python-2.7 merge elementtree

我有两个XML文件,我试图合并。

XML1:

<hierachyAttributes>
    <attribute>
        <displayOrder>2</displayOrder>
        <attributeID>Demographics</attributeID>
        <children>
            <attribute>
                <displayOrder>1</displayOrder>
                <attributeID>age</attributeID>
        </children>
    </attribute>
</hierachyAttributes>

XML2:

<diseaseAttributes>
    <diseaseName>Cancer</diseaseName>
    <diseaseID>1322843</diseaseID>
    <metaAttributes>
        <attribute>
            <description>Age</description>
            <displayName>Age (years)</displayName>
            <attributeID>age</attributeID>
            <type>Double</type>
            <attributeCategory>Clinical</attributeCategory>
            <displayInSummary>TRUE</displayInSummary>
                <group>
                    <displayOrder>1</displayOrder>
                    <displayName>0 - &lt; 10</displayName>
                    <minValue>0</minValue>
                    <minInclusive>TRUE</minInclusive>
                    <maxValue>10</maxValue>
                    <maxInclusive>FALSE</maxInclusive>
                </group>
            </valueGroups>
        </attribute>
    </metaAttributes>
</diseaseAttributes>

有没有办法像下面那样合并它们,即使是不同的根标签,在这种情况下是hierachyAttributes和diseaseAttributes? CombinedXML:

<hierachyAttributes>
<diseaseAttributes>
    <diseaseName>Cancer</diseaseName>
    <diseaseID>1322843</diseaseID>
    <metaAttributes>
        <attribute>
        <displayOrder>2</displayOrder>
        <attributeID>Demographics</attributeID>
        <children>
            <attribute>
                <displayOrder>1</displayOrder>
                <attributeID>age</attributeID>
                <description>Age</description>
                <displayName>Age (years)</displayName>
                <type>Double</type>
                <attributeCategory>Clinical</attributeCategory>
                <displayInSummary>TRUE</displayInSummary>
                    <group>
                        <displayOrder>1</displayOrder>
                        <displayName>0 - &lt; 10</displayName>
                        <minValue>0</minValue>
                        <minInclusive>TRUE</minInclusive>
                        <maxValue>10</maxValue>
                        <maxInclusive>FALSE</maxInclusive>
                    </group>
                </valueGroups>
            </attribute>
        </children>
    </metaAttributes>
</diseaseAttributes>
</hierachyAttributes>

即,在attributeID相同的地方合并它们。我尝试了以下内容,但它连接了一个xml文件。

#!/usr/bin/env python
import sys
from xml.etree import ElementTree

def run(files):
    first = None
    for filename in files:
        data = ElementTree.parse(filename).getroot()
        if first is None:
            first = data
        else:
            first.extend(data)
    if first is not None:
        print ElementTree.tostring(first)

if __name__ == "__main__":
    run(sys.argv[1:])           

或者如果标签被替换为并且我想要相同的输出但是在一个根节点下,即疾病属性,我该如何实现呢?

2 个答案:

答案 0 :(得分:3)

您的第一个XML文件缺少</attribute>下的结束<children>标记。它们在结构方面也非常糟糕 - 可笑的冗长和容易混淆的命名,所以我实际上不认为我能说出你想要做什么。

第一个文件看起来好像只是表达了一个“属性”关系树。这是我没有得到的第二个 - 它似乎包含一个名称属性“Age”的定义,它是什么类型的数据,但它是“癌症”下面的一部分。为什么?我的猜测是你会显示按年龄划分的结果,但是为什么Age会和癌症挂钩?如果你有年龄数据,会发生什么?冬季死于流感,是否有自己独特的年龄属性?

实际上,我的第一个问题是......这就是XML2应该如何运作的:

<disease-definitions>
  <disease-definition id="1322843">
    <name>Cancer</name>

    <attribute-definitions>
      <attribute id="age" category="Clinical">
        <description>Age</description>
        <displayName>Age (years)</name>
        <type>Double</type>

        <attribute-summary displayed="true">
          <group>
            <displayName>&lt; 10</displayName>
            <range type="half-open">
              <min>0</min>
              <max>10</max>
            </range>
          </group>
          <group>
            <displayName>10 - 20</displayName>
            <range type="half-open">
              <min>10</min>
              <max>20</max>
            </range>
          </group>
        </attribute-summary>
      </attribute>
    </attribute-definitions>
  </disease-definition>

  <disease-definition id="1322844">
    <name>Influenza</name>

    <attribute-definitions>
      <attribute id="age" category="Clinical">
        <description>Age</description>
        <displayName>Age (years)</name>
        <type>Double</type>

        <attribute-summary displayed="true">
          <group>
            <displayName>Children</displayName>
            <range type="half-open">
              <min>0</min>
              <max>18</max>
            </range>
          </group>
          <group>
            <displayName>Adults</displayName>
            <range type="half-open">
              <min>18</min>
              <max>60</max>
            </range>
          </group>
          <group>
            <displayName>Elderly</displayName>
            <range type="half-open">
              <min>60</min>
            </range>
          </group>
        </attribute-summary>
      </attribute>
    </attribute-definitions>
  </disease-definition>
<disease-definitions>

因为这似乎是你所暗示的,即使我做得那么小也是如此可怕。而且我不确定分层信息是如何适应那里的。

属性及其层次结构只是关于显示数据吗?即便如此,这似乎更好

<attribute id="demographics">
  <title>Demographics</title>
  <children>
    <child id="age" />
    <child id="gender" />
  </children>
</attribute>

<attribute id="epidemiology">
  <title>Epidemiology</title>
  <children>
    <child id="reported-date" />
    <child id="variant-strains" />
  </children>
</attribute>

<attribute id="age">
  <title>Age</title>
  <description>Age in years</description>
  <category>Clinical</category>

  <data type="double">
    <min-value>0</min-value>
  </data>
</attribute>

<attribute id="gender">
  <title>Gender</title>

  <data type="options">
    <one-of>
      <option id="M">
        <title>Male</title>
      </option>
      <option id="F">
        <title>Female</title>
      </option>
    </one-pf>
  </data>
</attribute>

然后

<disease-definitions>
  <disease id="1322843">
    <displayName>Cancer</displayName>

    <disease-attributes>
      <attribute ref-id="age">
        <displayName>Age of death</displayName>

        <displayed-in-summary>true</displayed-in-summary>
        <display format="histogram">
          <range max="10">Up to 10</range>
          <range min="10" max="25">Teenagers &amp; young adults</range>
          <range min="25" max="55">Adults</range>
          <range min="55">Elderly</range>
        </display-data>
        <display
      </attribute>

      <attribute ref-id="gender">
        <displayName>Gender of death</displayName>

        <displayed-in-summary>true</displayed-in-summary>
        <display format="pie">
          <slice option-id="M" background="#44F">Male deaths</slice>
          <slice option-id="F" background="#F44">Female deaths</slice>
        </display-data>
        <display
      </attribute>
    </disease-attributes>
  </disease>

  <disease id="1322844">
    <displayName>Influenza</displayName>

    <disease-attributes>
      <attribute ref-id="age">
        <displayName>Age of death</displayName>

        <displayed-in-summary>true</displayed-in-summary>
        <display-data format="grouped">
          <range max="10">Up to 10</range>
          <range min="10" max="25">Teenagers &amp; young adults</range>
          <range min="25" max="55">Adults</range>
          <range min="55">Elderly</range>
        </display-data>
        <display
      </attribute>
    </disease-attributes>
  </disease>

</disease-definitions>

答案 1 :(得分:0)

我认为您最好通过使用

安装lxml模块来完成

pip install lxml

并将其用于任何与XML相关的代码,因为它比内置的东西更好用。看看本教程,有很多方法可以在一个进程中加载​​,解析和处理每个文件中的属性元素。

有更多有用的信息

Python XML processing with lxml