我在xml中有这个结构:
<chapter name="Chapter 1">
<page href="6189242584662016.xml" id="PmdrLF" name="Page 1" preview="" reportable="false"/>
<chapter name="Unit">
<page href="6274488671928320.xml" id="z4859l" name="Page 2" preview="" reportable="false"/>
<page href="5159758788034560.xml" id="svTnDD" name="Page 3" preview="" reportable="false"/>
<chapter name="SubUnit">
<page href="4679923007488000.xml" id="cEspy9" name="Page 4" preview="" reportable="true"/>
<page href="5504349496147968.xml" id="KjQ7bG" name="Page 5" preview="" reportable="true"/>
<chapter name="Subsubunit">
<page href="5781185908178944.xml" id="3GMqVp" name="Page 6" preview="" reportable="true"/>
<page href="5938154077945856.xml" id="BRL9vi" name="Page 7" preview="" reportable="true"/>
<page href="4872313035030528.xml" id="e5KpyU" name="Page 8" preview="" reportable="true"/>
</chapter>
</chapter>
</chapter>
</chapter>
<chapter name="Chapter 2">
<page href="5422180966858752.xml" id="0vZ25G" name="Page 9" preview="" reportable="false"/>
<chapter name="SubChapter 1">
<page href="6049587004440576.xml" id="vRWo4F" name="Page 10" preview="" reportable="true"/>
<page href="6302141382656000.xml" id="JQ31J8" name="Page 11" preview="" reportable="true"/>
</chapter>
</chapter>
我想在python中打印这个结构,并且有可能以简单的方式为每个页面添加分数。
我有这个功能:
def get_hierarchy(element, pages, page_number, chapter_level, chapter_number):
for node in element.childNodes:
if node.nodeName == 'page':
page_number += 1
if node.nodeName == 'folder' and node.getAttribute('name') == 'commons':
continue
if node.nodeName == 'footer' or node.nodeName == 'header':
continue
if node.nodeName == 'page' and node.getAttribute("reportable") == 'false':
continue
if node.nodeName == 'page':
pages.append({'name': node.getAttribute("name"), 'page_number': page_number, 'parent' : chapter_level, 'is_page': True, 'chapter_number': chapter_number})
else:
chapter_number += 1
pages.append({'name': node.getAttribute("name"), 'parent' : chapter_level, 'is_page': False, 'chapter_number': chapter_number})
if node.nodeName == 'chapter':
chapter_level = chapter_number
page_number, chapter_number = get_hierarchy(node, pages, page_number, chapter_level, chapter_number)
chapter_level = 0
return page_number, chapter_number
但是当我想要打印目录时,我还需要通过递归迭代很难。
如何解析这个xml结构,以便能够轻松打印它,即在html中?
答案 0 :(得分:0)
import xml.etree.ElementTree as ET
def process_chap(node,base_string):
base_string+=node.get("name")
print base_string
for subnode in node.findall("./chapter"):
process_chap(subnode,base_string+"->")
root=ET.fromstring("<root>"+open("/tmp/test.xml").read()+"</root>")
for chapter in root.findall("./chapter"):
process_chap(chapter,"")
示例输出
Chapter 1
Chapter 1->Unit
Chapter 1->Unit->SubUnit
Chapter 1->Unit->SubUnit->Subsubunit
Chapter 2
Chapter 2->SubChapter 1