如何包装标签的所有内容?

时间:2017-05-20 11:38:55

标签: python xml xslt

我有以下XML部分:

<table>
  <tr>
    <td>Hello</td>
    <td>Hello</td>
    <td>
      <p>Hello already in P</p>
    </td>
    <td>
      This one has some naked text
      <span>and some span wrapped text</span>
    </td>
  </tr>
</table>

我想(在p标签中)包装尚未包装在p标签中的每个单元格的内容。所以输出是:

<table>
  <tr>
    <td><p>Hello</p></td>
    <td><p>Hello</p></td>
    <td>
      <p>Hello already in p tag</p>
    </td>
    <td>
      <p>
        This one has some text
        <span>and some span wrapped text</span>
      </p>
    </td>
  </tr>
</table>

我在我的项目中使用了lxml etree,但该库似乎没有“wrap”方法或类似的东西。

现在我想也许这是XSLT转换的工作,但我想避免在我的Python项目中添加另一层复杂性+其他依赖项。

td的内容可以是任何深度

1 个答案:

答案 0 :(得分:1)

我自己没有使用lxml包,但请尝试以下方法:

def wrap(root):
    # find <td> elements that do not have a <p> element
    cells = etree.XPath("//td[not(p)]")(root)
    for cell in cells:
        # Create new <p> element
        e = Element("p")
        # Set the <p> element text from the parent
        e.text = cell.text
        # Clear the parent text because it is now in the <p> element
        cell.text = None
        # Move the parents children and make them the <p> element's children
        # (because the span on line 10 of the input file should be nested)
        for child in cell.getchildren():
           # This actually moves the child from the <td> element to the <p> element
           e.append(child)
        # Set the new <p> element as the cell's child
        cell.append(e)