如何将python中的这个结构转换为目录?

时间:2016-03-17 17:02:41

标签: python recursion structure

我在xml中有这个结构:

<chapter name="Chapter 1">
        <page href="6189242584662016.xml" id="PmdrLF" name="Page 1" preview="" reportable="false"/>
        <chapter name="Unit">
            <page href="6274488671928320.xml" id="z4859l" name="Page 2" preview="" reportable="false"/>
            <page href="5159758788034560.xml" id="svTnDD" name="Page 3" preview="" reportable="false"/>
            <chapter name="SubUnit">
                <page href="4679923007488000.xml" id="cEspy9" name="Page 4" preview="" reportable="true"/>
                <page href="5504349496147968.xml" id="KjQ7bG" name="Page 5" preview="" reportable="true"/>
                <chapter name="Subsubunit">
                    <page href="5781185908178944.xml" id="3GMqVp" name="Page 6" preview="" reportable="true"/>
                    <page href="5938154077945856.xml" id="BRL9vi" name="Page 7" preview="" reportable="true"/>
                    <page href="4872313035030528.xml" id="e5KpyU" name="Page 8" preview="" reportable="true"/>
                </chapter>
            </chapter>
        </chapter>
    </chapter>
    <chapter name="Chapter 2">
        <page href="5422180966858752.xml" id="0vZ25G" name="Page 9" preview="" reportable="false"/>
        <chapter name="SubChapter 1">
            <page href="6049587004440576.xml" id="vRWo4F" name="Page 10" preview="" reportable="true"/>
            <page href="6302141382656000.xml" id="JQ31J8" name="Page 11" preview="" reportable="true"/>
        </chapter>
    </chapter>

我想在python中打印这个结构,并且有可能以简单的方式为每个页面添加分数。

我有这个功能:

def get_hierarchy(element, pages, page_number, chapter_level, chapter_number):
    for node in element.childNodes:
        if node.nodeName == 'page':
            page_number += 1
        if node.nodeName == 'folder' and node.getAttribute('name') == 'commons':
            continue
        if node.nodeName == 'footer' or node.nodeName == 'header':
            continue
        if node.nodeName == 'page' and node.getAttribute("reportable") == 'false':
            continue
        if node.nodeName == 'page':
            pages.append({'name': node.getAttribute("name"), 'page_number': page_number, 'parent' : chapter_level, 'is_page': True, 'chapter_number': chapter_number})
        else:
            chapter_number += 1
            pages.append({'name': node.getAttribute("name"), 'parent' : chapter_level, 'is_page': False, 'chapter_number': chapter_number})
        if node.nodeName == 'chapter':
            chapter_level = chapter_number
            page_number, chapter_number = get_hierarchy(node, pages, page_number, chapter_level, chapter_number)
            chapter_level = 0


    return page_number, chapter_number

但是当我想要打印目录时,我还需要通过递归迭代很难。

如何解析这个xml结构,以便能够轻松打印它,即在html中?

1 个答案:

答案 0 :(得分:0)

import xml.etree.ElementTree as ET


def process_chap(node,base_string):
    base_string+=node.get("name")
    print base_string
    for subnode in node.findall("./chapter"):
        process_chap(subnode,base_string+"->")


root=ET.fromstring("<root>"+open("/tmp/test.xml").read()+"</root>")
for chapter in root.findall("./chapter"):
    process_chap(chapter,"")

示例输出

Chapter 1
Chapter 1->Unit
Chapter 1->Unit->SubUnit
Chapter 1->Unit->SubUnit->Subsubunit
Chapter 2
Chapter 2->SubChapter 1