lxml - 如何删除元素但不是它的内容?

时间:2015-03-15 20:59:28

标签: python lxml

我们假设我有以下代码:

<div id="first">
  <div id="second">
    <a></a>
    <ul>...</ul>
  </div>
</div>

这是我的代码:

div_parents = root_element.xpath('//div[div]')

for div in reversed(div_parents): 
    if len(div.getchildren()) == 1:
        # remove second div and replace it with it's content

我与div孩子达成了div,然后我想删除孩子div,如果这是他父母唯一的孩子。结果应该是:

<div id="first">
   <a></a>
   <ul>...</ul>
</div>

我想这样做:

div.replace(div.getchildren()[0], div.getchildren()[0].getchildren())

但不幸的是,关于替换的两个论点应该只包含一个element。有没有什么比将第一个div的所有属性重新分配给第二个div然后替换它们更容易? - 因为我可以轻松地做到:

div.getparent().replace(div, div.getchildren()[0])

2 个答案:

答案 0 :(得分:1)

考虑将copy.deepcopy用作suggested in the docs

例如:

div_parents = root_element.xpath('//div[div]')

for outer_div in div_parents:
    if len(outer_div.getchildren()) == 1:
        inner_div = outer_div[0]
        # Copy the children of innder_div to outer_div
        for e in inner_div: outer_div.append( copy.deepcopy(e) )
        # Remove inner_div from outer_div
        outer_div.remove(inner_div)

用于测试的完整代码:

import copy
import lxml.etree

def pprint(e): print(lxml.etree.tostring(e, pretty_print=True))

xml = '''
<body>
    <div id="first">
      <div id="second">
        <a>...</a>
        <ul>...</ul>
      </div>
    </div>
</body>
'''

root_element = lxml.etree.fromstring(xml)
div_parents = root_element.xpath('//div[div]')

for outer_div in div_parents:
    if len(outer_div.getchildren()) == 1:
        inner_div = outer_div[0]
        # Copy the children of innder_div to outer_div
        for e in inner_div: outer_div.append( copy.deepcopy(e) )
        # Remove inner_div from outer_div
        outer_div.remove(inner_div)

pprint(root_element)

输出:

<body>
    <div id="first">
      <a>...</a>
        <ul>...</ul>
      </div>
</body>

注意:测试代码中包含<body>标记是不必要的,我只是用它来测试多个案例。测试代码在输入时没有问题。

答案 1 :(得分:1)

我只是使用list-replacement:

 from lxml.etree import fromstring, tostring

 xml = """<div id="first">
   <div id="second">
     <a></a>
     <ul>...</ul>
   </div>
 </div>"""


 doc = fromstring(xml)
 outer_divs = doc.xpath("//div[div]")

 for outer in outer_divs:
     outer[:] = list(outer[0])


 print tostring(doc)