如何删除特定类的标签?

时间:2018-05-02 01:28:51

标签: python python-3.x beautifulsoup

我正在使用Beautifulsoup(python3.x)解析HTML页面 我正试图从<获取数据p为H.我写的标签

def getBody(url):
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    Con = "".join([p.text for p in soup.find_all("p")])
    #print(Con)
return Con

但是这样做我从下面的htmltag获取文本。我怎么能删除这个?

<p class="notice">Comments are closed for this article.</p>

1 个答案:

答案 0 :(得分:1)

您可以使用decompose()extract()删除代码。

>>> from bs4 import BeautifulSoup
>>> html = '''
... <p>text</p>
... <p class="notice">Comments are closed for this article.</p>
... <p>text</p>
... <p class="notice">Comments are closed for this article.</p>
... <p>text</p>'''
>>> soup = BeautifulSoup(html, 'html.parser')
>>> for tag in soup.find_all('p', class_='notice'):
...     tag.decompose()
...
>>> soup

<p>text</p>

<p>text</p>

<p>text</p>