需要使用python(xml.etree.ElementTree)解析大型XML文件,以处理并生成报告,如预期部分所示。
对于某些细节,我无法弄清楚如何下降到第4级,而从中可以了解到各个数据/相关数据的第5级。我的问题是在哪里循环以及如何引用孩子。请提供您可能有的任何建议,谢谢。
Input XML File: raw_data.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<FirstLevel Flevel="my1">
<SecondLevel Slevel="my2">
<ThirdLevel Tlevel="my3">
<FourthLevel test="1" mydata="Needed1">
<FifthLevel associated="Required for Needed1"/>
</Fourthlevel>
<FourthLevel test="2" mydata="Needed2">
<FifthLevel associated="Required for Needed2"/>
</Fourthlevel>
<FourthLevel test="3" mydata="Needed3">
<FifthLevel associated="Required for Needed3-1"/>
<FifthLevel associated="Required for Needed3-2"/>
</Fourthlevel>
<FourthLevel test="4" mydata="Needed4">
<FifthLevel associated="Required for Needed4-1"/>
<FifthLevel associated="Required for Needed4-2"/>
</Fourthlevel>
</ThirdLevel>
</SecondLevel>
</FirstLevel>
-----------------------------------------------------------
My Code:
import xml.etree.ElementTree as ET
tree = ET.parse('raw_data.xml')
root=tree.getroot()
mylevel=root.findall('.//FourthLevel')
for i in mylevel:
print ("mydata=",i.get('mydata'),"\t")
assoc=root.findall('.//FifthLevel') ### assoc: Temporary variable for associated data
for j in assoc:
print ("associated=",j.get('associated'),"\n")
Output: final_output.txt
mydata=Needed1 associated=Required for Needed1
mydata=Needed2 associated=Required for Needed2
mydata=Needed3 associated=Required for Needed3-1
mydata=Needed3 associated=Required for Needed3-1
mydata=Needed4 associated=Required for Needed4-1
mydata=Needed4 associated=Required for Needed4-1
答案 0 :(得分:0)
您已经在迭代 root 的子节点,其名称为“ .// FourthLevel” 。您只需对每个孩子及其名字为“ FifthLevel” 的孩子应用相同的原则(请注意,缺少斜杠)。
翻译成代码,您只需要替换以下行:
assoc=root.findall('.//FifthLevel')
作者:
assoc = i.findall("FifthLevel")
因为您仅需要当前节点(第4 级)的第5 级子级,而不需要整个树。检查[Python 3]: xml.etree.ElementTree - The ElementTree XML API了解更多详细信息。