Python:如何在html及其内容中删除具有特定Id的标记?

时间:2012-12-16 09:39:39

标签: python html

我想从html页面中删除标记(具有特定ID)。例如:

<div id="id1" >
      "Contents here"
</div>

<div id="id2"> ...</div>

如果我想删除第一个标签,而不是第二个标签,那我该怎么做?

1 个答案:

答案 0 :(得分:3)

使用BeautifulSoup

In [32]: from BeautifulSoup import BeautifulSoup

In [33]: doc = '''<div id="id1" >
      "Contents here"
</div>
<div id="id2"> ...</div>'''

In [34]: soup = BeautifulSoup(doc)

In [35]: id1 = soup.find('div', id='id1')

In [36]: print soup
<div id="id1">
      "Contents here"
</div>
<div id="id2"> ...</div>

In [37]: id1.extract()
Out[37]: 
<div id="id1">
      "Contents here"
</div>

In [38]: print soup

<div id="id2"> ...</div>