如何使用Beautiful4过滤多类

时间:2016-01-19 09:31:41

标签: python beautifulsoup

from bs4 import BeautifulSoup

html = """
    <div class="aa bb"></div>
    <div class="aa ccc"></div>
    <div class="aa"></div>
"""


def find(aclass):
    print(aclass)
    return aclass != "bb"

soup = BeautifulSoup(html, 'lxml')

div = soup.find_all('div', attrs={'class': find})

print(div)

我只想要class ='aa',而不是'aa bb'或其他任何人。 请帮我! 谢谢!

2 个答案:

答案 0 :(得分:4)

这是一个答案 BeautifulSoup webscraping find_all( ): finding exact match

这将只为您提供带有'aa'类的标签。

div = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['aa'])

答案 1 :(得分:2)

您还可以使用简单的CSS selector

soup.select("div[class=aa]")

演示:

>>> from bs4 import BeautifulSoup
>>> 
>>> html = """
...     <div class="aa bb"></div>
...     <div class="aa ccc"></div>
...     <div class="aa"></div>
... """
>>> soup = BeautifulSoup(html, 'lxml')
>>> 
>>> for elm in soup.select("div[class=aa]"):
...     print(str(elm))
... 
<div class="aa"></div>