我有两个XML文件,我试图合并。
XML1:
<hierachyAttributes>
<attribute>
<displayOrder>2</displayOrder>
<attributeID>Demographics</attributeID>
<children>
<attribute>
<displayOrder>1</displayOrder>
<attributeID>age</attributeID>
</children>
</attribute>
</hierachyAttributes>
XML2:
<diseaseAttributes>
<diseaseName>Cancer</diseaseName>
<diseaseID>1322843</diseaseID>
<metaAttributes>
<attribute>
<description>Age</description>
<displayName>Age (years)</displayName>
<attributeID>age</attributeID>
<type>Double</type>
<attributeCategory>Clinical</attributeCategory>
<displayInSummary>TRUE</displayInSummary>
<group>
<displayOrder>1</displayOrder>
<displayName>0 - < 10</displayName>
<minValue>0</minValue>
<minInclusive>TRUE</minInclusive>
<maxValue>10</maxValue>
<maxInclusive>FALSE</maxInclusive>
</group>
</valueGroups>
</attribute>
</metaAttributes>
</diseaseAttributes>
有没有办法像下面那样合并它们,即使是不同的根标签,在这种情况下是hierachyAttributes和diseaseAttributes? CombinedXML:
<hierachyAttributes>
<diseaseAttributes>
<diseaseName>Cancer</diseaseName>
<diseaseID>1322843</diseaseID>
<metaAttributes>
<attribute>
<displayOrder>2</displayOrder>
<attributeID>Demographics</attributeID>
<children>
<attribute>
<displayOrder>1</displayOrder>
<attributeID>age</attributeID>
<description>Age</description>
<displayName>Age (years)</displayName>
<type>Double</type>
<attributeCategory>Clinical</attributeCategory>
<displayInSummary>TRUE</displayInSummary>
<group>
<displayOrder>1</displayOrder>
<displayName>0 - < 10</displayName>
<minValue>0</minValue>
<minInclusive>TRUE</minInclusive>
<maxValue>10</maxValue>
<maxInclusive>FALSE</maxInclusive>
</group>
</valueGroups>
</attribute>
</children>
</metaAttributes>
</diseaseAttributes>
</hierachyAttributes>
即,在attributeID相同的地方合并它们。我尝试了以下内容,但它连接了一个xml文件。
#!/usr/bin/env python
import sys
from xml.etree import ElementTree
def run(files):
first = None
for filename in files:
data = ElementTree.parse(filename).getroot()
if first is None:
first = data
else:
first.extend(data)
if first is not None:
print ElementTree.tostring(first)
if __name__ == "__main__":
run(sys.argv[1:])
或者如果标签被替换为并且我想要相同的输出但是在一个根节点下,即疾病属性,我该如何实现呢?
答案 0 :(得分:3)
您的第一个XML文件缺少</attribute>
下的结束<children>
标记。它们在结构方面也非常糟糕 - 可笑的冗长和容易混淆的命名,所以我实际上不认为我能说出你想要做什么。
第一个文件看起来好像只是表达了一个“属性”关系树。这是我没有得到的第二个 - 它似乎包含一个名称属性“Age”的定义,它是什么类型的数据,但它是“癌症”下面的一部分。为什么?我的猜测是你会显示按年龄划分的结果,但是为什么Age会和癌症挂钩?如果你有年龄数据,会发生什么?冬季死于流感,是否有自己独特的年龄属性?
实际上,我的第一个问题是......这就是XML2应该如何运作的:
<disease-definitions>
<disease-definition id="1322843">
<name>Cancer</name>
<attribute-definitions>
<attribute id="age" category="Clinical">
<description>Age</description>
<displayName>Age (years)</name>
<type>Double</type>
<attribute-summary displayed="true">
<group>
<displayName>< 10</displayName>
<range type="half-open">
<min>0</min>
<max>10</max>
</range>
</group>
<group>
<displayName>10 - 20</displayName>
<range type="half-open">
<min>10</min>
<max>20</max>
</range>
</group>
</attribute-summary>
</attribute>
</attribute-definitions>
</disease-definition>
<disease-definition id="1322844">
<name>Influenza</name>
<attribute-definitions>
<attribute id="age" category="Clinical">
<description>Age</description>
<displayName>Age (years)</name>
<type>Double</type>
<attribute-summary displayed="true">
<group>
<displayName>Children</displayName>
<range type="half-open">
<min>0</min>
<max>18</max>
</range>
</group>
<group>
<displayName>Adults</displayName>
<range type="half-open">
<min>18</min>
<max>60</max>
</range>
</group>
<group>
<displayName>Elderly</displayName>
<range type="half-open">
<min>60</min>
</range>
</group>
</attribute-summary>
</attribute>
</attribute-definitions>
</disease-definition>
<disease-definitions>
因为这似乎是你所暗示的,即使我做得那么小也是如此可怕。而且我不确定分层信息是如何适应那里的。
属性及其层次结构只是关于显示数据吗?即便如此,这似乎更好
<attribute id="demographics">
<title>Demographics</title>
<children>
<child id="age" />
<child id="gender" />
</children>
</attribute>
<attribute id="epidemiology">
<title>Epidemiology</title>
<children>
<child id="reported-date" />
<child id="variant-strains" />
</children>
</attribute>
<attribute id="age">
<title>Age</title>
<description>Age in years</description>
<category>Clinical</category>
<data type="double">
<min-value>0</min-value>
</data>
</attribute>
<attribute id="gender">
<title>Gender</title>
<data type="options">
<one-of>
<option id="M">
<title>Male</title>
</option>
<option id="F">
<title>Female</title>
</option>
</one-pf>
</data>
</attribute>
然后
<disease-definitions>
<disease id="1322843">
<displayName>Cancer</displayName>
<disease-attributes>
<attribute ref-id="age">
<displayName>Age of death</displayName>
<displayed-in-summary>true</displayed-in-summary>
<display format="histogram">
<range max="10">Up to 10</range>
<range min="10" max="25">Teenagers & young adults</range>
<range min="25" max="55">Adults</range>
<range min="55">Elderly</range>
</display-data>
<display
</attribute>
<attribute ref-id="gender">
<displayName>Gender of death</displayName>
<displayed-in-summary>true</displayed-in-summary>
<display format="pie">
<slice option-id="M" background="#44F">Male deaths</slice>
<slice option-id="F" background="#F44">Female deaths</slice>
</display-data>
<display
</attribute>
</disease-attributes>
</disease>
<disease id="1322844">
<displayName>Influenza</displayName>
<disease-attributes>
<attribute ref-id="age">
<displayName>Age of death</displayName>
<displayed-in-summary>true</displayed-in-summary>
<display-data format="grouped">
<range max="10">Up to 10</range>
<range min="10" max="25">Teenagers & young adults</range>
<range min="25" max="55">Adults</range>
<range min="55">Elderly</range>
</display-data>
<display
</attribute>
</disease-attributes>
</disease>
</disease-definitions>
答案 1 :(得分:0)
我认为您最好通过使用
安装lxml
模块来完成
pip install lxml
并将其用于任何与XML相关的代码,因为它比内置的东西更好用。看看本教程,有很多方法可以在一个进程中加载,解析和处理每个文件中的属性元素。
有更多有用的信息