我的输入文件如下所示:
<article>
<pages>
<list-item>content of page 1</list-item>
<list-item>content of page 2</list-item>
<list-item>content of page 3</list-item>
</pages>
</article>
我想将其转换为另一个类似
的XML文件<text>
<page>content of page 1</page>
<page>content of page 2</page>
<page>content of page 3</page>
</text>
以下丑陋的代码实现了我想要实现的目标:
oldtree = etree.parse(infile)
newtree = etree.Element("text")
newtree.append(oldtree.find("pages"))
outfile.write(etree.tostring(newtree).replace(u"<pages>", u"").replace(u"</pages>",u"").replace(u"<list-item>", u"<page>").replace("</list-item>", u"</page>"))
丑陋的部分是XML转换与暴力替换的混合。是否有更纯洁,更美丽的方式来实现我的目标?
答案 0 :(得分:4)
类似的东西:
from lxml.etree import fromstring, tostring
text_tree = """
<article>
<pages>
<list-item>content of page 1</list-item>
<list-item>content of page 2</list-item>
<list-item>content of page 3</list-item>
</pages>
</article>
"""
pages = fromstring(text_tree).find('pages')
pages.tag = 'text'
for list_item in pages.findall('list-item'):
list_item.tag = 'page'
print tostring(pages)
会给:
<text>
<page>content of page 1</page>
<page>content of page 2</page>
<page>content of page 3</page>
</text>