我有一些HTML标记,我想要摆脱<b>
元素的一些<center>
子元素(它的遗留标记......)。
问题:部分包含<center>
个元素&#39;当我使用 Python 和 lxml 删除子项时,文本消失。
示例程序(带有简化的说明性标记):
#!/usr/bin/env python3
from lxml import html, etree
from lxml.etree import tostring
html_snippet = """
<center>
<b>IT wisdoms</b>
<b>
for your <a href="#">brain</a>:
</b>
NEVER <a href="#">change a running system</a> before the holidays!
</center>"""
tree = html.fromstring(html_snippet)
center_elem = tree.xpath("//center")[0]
print('----- BEFORE -----')
print(tostring(center_elem, pretty_print=True, encoding='unicode'))
for elem in center_elem.xpath("b"):
elem.getparent().remove(elem)
print('----- AFTER -----')
print(tostring(center_elem, pretty_print=True, encoding='unicode'))
输出:
----- BEFORE -----
<center>
<b>IT wisdoms</b>
<b>
for your <a href="#">brain</a>:
</b>
NEVER <a href="#">change a running system</a> before the holidays!
</center>
----- AFTER -----
<center>
<a href="#">change a running system</a> before the holidays!
</center>
正如您所看到的,<b>
个孩子已经消失了,但从不这个词消失了,而<a>
元素和节日前的文字消失了! / strong>留下。
我无法弄清楚如何保留它!
答案 0 :(得分:2)
尝试对要消除的元素使用drop_tree()
:
tree = html.fromstring(html_snippet)
center_elem = tree.xpath("//center")[0]
print('----- BEFORE -----')
print(etree.tostring(center_elem, pretty_print=True, encoding='unicode'))
for elem in center_elem.xpath("b"):
elem.drop_tree()
print('----- AFTER -----')
print(etree.tostring(center_elem, pretty_print=True, encoding='unicode'))
返回:
----- BEFORE -----
<center>
<b>IT wisdoms</b>
<b>
for your <a href="#">brain</a>:
</b>
NEVER <a href="#">change a running system</a> before the holidays!
</center>
----- AFTER -----
<center>
NEVER <a href="#">change a running system</a> before the holidays!
</center>