Python:具有根树的多个子级的大型XML解析

时间:2019-02-05 06:39:37

标签: python xml parsing

需要使用python(xml.etree.ElementTree)解析大型XML文件,以处理并生成报告,如预期部分所示。

对于某些细节,我无法弄清楚如何下降到第4级,而从中可以了解到各个数据/相关数据的第5级。我的问题是在哪里循环以及如何引用孩子。请提供您可能有的任何建议,谢谢。

Input XML File: raw_data.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<FirstLevel Flevel="my1">
    <SecondLevel Slevel="my2">
        <ThirdLevel Tlevel="my3">
            <FourthLevel test="1" mydata="Needed1">
                <FifthLevel associated="Required for Needed1"/>
            </Fourthlevel>  
            <FourthLevel test="2" mydata="Needed2">
                <FifthLevel associated="Required for Needed2"/>
            </Fourthlevel>  
            <FourthLevel test="3" mydata="Needed3">
                <FifthLevel associated="Required for Needed3-1"/>
                <FifthLevel associated="Required for Needed3-2"/>
            </Fourthlevel>  
            <FourthLevel test="4" mydata="Needed4">
                <FifthLevel associated="Required for Needed4-1"/>
                <FifthLevel associated="Required for Needed4-2"/>
            </Fourthlevel>  
        </ThirdLevel>
    </SecondLevel>
</FirstLevel>
-----------------------------------------------------------

My Code:

    import xml.etree.ElementTree as ET
    tree = ET.parse('raw_data.xml')
    root=tree.getroot()
    mylevel=root.findall('.//FourthLevel')
    for i in mylevel:
        print ("mydata=",i.get('mydata'),"\t")
        assoc=root.findall('.//FifthLevel') ### assoc: Temporary variable for associated data
        for j in assoc:
             print ("associated=",j.get('associated'),"\n")




Output: final_output.txt

mydata=Needed1  associated=Required for Needed1
mydata=Needed2  associated=Required for Needed2
mydata=Needed3  associated=Required for Needed3-1
mydata=Needed3  associated=Required for Needed3-1
mydata=Needed4  associated=Required for Needed4-1
mydata=Needed4  associated=Required for Needed4-1

1 个答案:

答案 0 :(得分:0)

您已经在迭代 root 的子节点,其名称为“ .// FourthLevel” 。您只需对每个孩子及其名字为“ FifthLevel” 的孩子应用相同的原则(请注意,缺少斜杠)。

翻译成代码,您只需要替换以下行:

assoc=root.findall('.//FifthLevel')

作者:

assoc = i.findall("FifthLevel")

因为您仅需要当前节点(第4 级)的第5 级子级,而不需要整个树。检查[Python 3]: xml.etree.ElementTree - The ElementTree XML API了解更多详细信息。