我有以下XML部分:
<table>
<tr>
<td>Hello</td>
<td>Hello</td>
<td>
<p>Hello already in P</p>
</td>
<td>
This one has some naked text
<span>and some span wrapped text</span>
</td>
</tr>
</table>
我想(在p标签中)包装尚未包装在p标签中的每个单元格的内容。所以输出是:
<table>
<tr>
<td><p>Hello</p></td>
<td><p>Hello</p></td>
<td>
<p>Hello already in p tag</p>
</td>
<td>
<p>
This one has some text
<span>and some span wrapped text</span>
</p>
</td>
</tr>
</table>
我在我的项目中使用了lxml etree,但该库似乎没有“wrap”方法或类似的东西。
现在我想也许这是XSLT转换的工作,但我想避免在我的Python项目中添加另一层复杂性+其他依赖项。
td的内容可以是任何深度
答案 0 :(得分:1)
我自己没有使用lxml包,但请尝试以下方法:
def wrap(root):
# find <td> elements that do not have a <p> element
cells = etree.XPath("//td[not(p)]")(root)
for cell in cells:
# Create new <p> element
e = Element("p")
# Set the <p> element text from the parent
e.text = cell.text
# Clear the parent text because it is now in the <p> element
cell.text = None
# Move the parents children and make them the <p> element's children
# (because the span on line 10 of the input file should be nested)
for child in cell.getchildren():
# This actually moves the child from the <td> element to the <p> element
e.append(child)
# Set the new <p> element as the cell's child
cell.append(e)