这里有很多类似标题的问题,但我正试图从汤对象本身中删除标签。
我有一个页面,其中包含此div
:
<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>
我可以选择<div id="content">
soup.find('div', id='content')
,但我想从中移除<div id="blah">
。
答案 0 :(得分:6)
The Tag.decompose
method从树中删除tag
。
所以找到div
标记:
div = soup.find('div', {'id':'content'})
循环所有孩子,但第一个:
for child in list(div)[1:]:
并尝试分解孩子:
try:
child.decompose()
except AttributeError: pass
import bs4 as bs
content = '''<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>'''
soup = bs.BeautifulSoup(content)
div = soup.find('div', {'id':'content'})
for child in list(div)[1:]:
try:
child.decompose()
except AttributeError: pass
print(div)
产量
<div id="content">
I want to keep this
</div>
使用lxml的等价物将是
import lxml.html as LH
content = '''<div id="content">
I want to keep this<br /><div id="blah">I want to remove this</div>
</div>'''
root = LH.fromstring(content)
div = root.xpath('//div[@id="content"]')[0]
for child in div:
div.remove(child)
print(LH.tostring(div))