Question

我是Python的新手，目前正在尝试构建一个网络爬虫来学习该语言。我要保存https://www.notebooksbilliger.de/studentenprogramm/notebooks中的所有列表，这些列表属于该站点的学生优惠类别中的所有笔记本。

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

my_url = 'https://www.notebooksbilliger.de/studentenprogramm/notebooks'

uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class":"mouseover clearfix"})

我也在控制台中尝试操作，但是当我检查容器的长度时，这是我得到的输出：

>>> len(containers)
1

这是不对的，因为每页的列表设置为50。我尝试使用不同的参数进行搜索，但是我总是似乎总是只找到一项，然后搜索就停止了。

我现在有点迷茫，无法完全解决该问题。有帮助吗？

问候：）

Answer 1

好吧，这很尴尬。

就在我发布它（经过多次搜索和无休止的尝试后为防御）后，我意识到html类不能包含空格，而mouseover clearfix实际上是2个类。这有效：

containers = page_soup.findAll("div", {"class":"mouseover"})

使用Python进行网络爬虫，BeautifulSoup findAll（）找不到全部

1 个答案: