运行以下简单脚本:
from lxml import etree
tree = etree.parse('VimKeys.xml')
root = tree.getroot()
for child in root:
print ("<table>")
print ("<caption>" + child.attrib['title'] + "</caption>")
for child in child:
print ("<tr>")
print ("<th>" + child.text + "</th>")
print ("<td>" + child.attrib['description'] + "</td>")
print ("</tr>")
print ("</table>")
针对以下xml:
<keycommands>
<category title="Editing">
<key description="replace">r</key>
<key description="change, line">c,cc</key>
<key description="join line with the following">J</key>
<key description="delete & insert">s</key>
<key description="change case">~</key>
<key description="apply last edit">.</key>
<key description="undo, redo">u,⌃+r</key>
<key description="indent line right, left">>>,<<</key>
<key description="auto-indent line">==</key>
</category>
</keycommands>
导致以下结果:
<caption>Editing</caption>
<tr>
<th>r</th>
<td>replace</td>
</tr>
<tr>
<th>c,cc</th>
<td>change, line</td>
</tr>
<tr>
<th>J</th>
<td>join line with the following</td>
</tr>
<tr>
<th>s</th>
<td>delete & insert</td>
</tr>
<tr>
<th>~</th>
<td>change case</td>
</tr>
<tr>
<th>.</th>
<td>apply last edit</td>
</tr>
<tr>
<th>u,⌃+r</th>
<td>undo, redo</td>
</tr>
<tr>
<th>>>,<<</th>
<td>indent line right, left</td>
</tr>
<tr>
<th>==</th>
<td>auto-indent line</td>
</tr>
</table>
这是无效的HTML,因为小于和大于引用为
的符号< and >
在源文档中。
如何在最终产品中保留这些产品?
答案 0 :(得分:1)
使用the Element class构建新的XML树,而不是使用print
来“手动”格式化它:
import lxml.etree as ET
tree = ET.parse('VimKeys.xml')
root = tree.getroot()
newroot = ET.Element('root')
for i, child in enumerate(root):
table = ET.Element('table')
newroot.insert(i, table)
caption = ET.Element('caption')
caption.text = child.attrib['title']
table.insert(0, caption)
for j, c in enumerate(child, 1):
tr = ET.Element('tr')
table.insert(j, tr)
th = ET.Element('th')
th.text = c.text
tr.insert(0, th)
td = ET.Element('td')
td.text = c.attrib['description']
tr.insert(1, td)
print(ET.tostring(newroot, pretty_print=True))
或者,使用the E-factory。这样做可以使预期的结构更易于阅读(和修改):
import lxml.etree as ET
import lxml.builder as builder
tree = ET.parse('VimKeys.xml')
root = tree.getroot()
E = builder.E
tables = []
for child in root:
trs = []
for c in child:
trs.append(E('tr',
E('th', c.text),
E('td', c.attrib['description'])))
tables.append(E('table',
E('caption', child.attrib['title']),
*trs))
newroot = E('root', *tables)
print(ET.tostring(newroot, pretty_print=True))