Question

我可以将这两个块组合成一个：

编辑：除了Yacoby之类的循环之外的任何其他方法都在答案中。

for tag in soup.findAll(['script', 'form']):
    tag.extract()

for tag in soup.findAll(id="footer"):
    tag.extract()

我也可以将多个块合并为一个：

for tag in soup.findAll(id="footer"):
    tag.extract()

for tag in soup.findAll(id="content"):
    tag.extract()

for tag in soup.findAll(id="links"):
    tag.extract()

或者可能有一些lambda表达式，我可以检查是否在数组或任何其他更简单的方法。

另外，如何找到属性类的标签，因为class是保留关键字：

编辑：这部分是由汤.findAll（attrs = {'class'：'noprint'}）解决的：

for tag in soup.findAll(class="noprint"):
    tag.extract()

Answer 1

您可以将功能传递给.findall()，如下所示：

soup.findAll(lambda tag: tag.name in ['script', 'form'] or tag['id'] == "footer")

但是首先建立一个标签列表然后迭代它可能会更好：

tags = soup.findAll(['script', 'form'])
tags.extend(soup.findAll(id="footer"))

for tag in tags:
    tag.extract()

如果您要过滤多个id，可以使用：

for tag in soup.findAll(lambda tag: tag.has_key('id') and
                                    tag['id'] in ['footer', 'content', 'links']):
    tag.extract()

更具体的方法是将lambda分配给id参数：

for tag in soup.findAll(id=lambda value: value in ['footer', 'content', 'links']):
    tag.extract()

Answer 2

我不知道BeautifulSoup是否可以更优雅地完成它，但你可以像这样合并两个循环：

for tag in soup.findAll(['script', 'form']) + soup.findAll(id="footer"):
    tag.extract()

你可以找到这样的课程（Documentation）：

for tag in soup.findAll(attrs={'class': 'noprint'}):
    tag.extract()

Answer 3

问题第二部分的答案是documentation中就在那里：

按CSS类搜索

如果不是一件事，那么attrs论证将是一个非常模糊的特征：CSS。搜索具有特定CSS类的标记非常有用，但CSS属性的名称class也是Python保留字。

你可以通过CSS类使用soup.find（“tagName”，{“class”：“cssClass”}）进行搜索，但这是一个很常见的操作代码。相反，您可以为attrs而不是字典传递字符串。该字符串将用于限制CSS类。
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""Bob's Bold Barbeque Sauce now available in 
 Hickory and Lime</a>""")

soup.find("b", { "class" : "lime" })
# Lime

soup.find("b", "hickory")
# Hickory

Answer 4

links = soup.find_all('a',class_='external') ,we can pass class_ to filter based on class values

from bs4 import BeautifulSoup
from urllib.request import urlopen

with urlopen('http://www.espncricinfo.com/') as f:
    raw_data= f.read()
    soup= BeautifulSoup(raw_data,'lxml')
    # print(soup)
    links = soup.find_all('a',class_='external')
    for link in links:
        print(link)

我可以将beautifulsoup中的两个'findAll'搜索块合并为一个吗？

4 个答案: