我想从html页面中删除标记(具有特定ID)。例如:
<div id="id1" >
"Contents here"
</div>
<div id="id2"> ...</div>
如果我想删除第一个标签,而不是第二个标签,那我该怎么做?
答案 0 :(得分:3)
In [32]: from BeautifulSoup import BeautifulSoup
In [33]: doc = '''<div id="id1" >
"Contents here"
</div>
<div id="id2"> ...</div>'''
In [34]: soup = BeautifulSoup(doc)
In [35]: id1 = soup.find('div', id='id1')
In [36]: print soup
<div id="id1">
"Contents here"
</div>
<div id="id2"> ...</div>
In [37]: id1.extract()
Out[37]:
<div id="id1">
"Contents here"
</div>
In [38]: print soup
<div id="id2"> ...</div>