Question

我尝试从某个网页上抓取一些信息，在一个网页上它可以正常工作，但在另一个网页上却无法正常工作，因为我只得到了无返回值

此代码/网页运行正常：

# https://realpython.com/beautiful-soup-web-scraper-python/
import requests
from bs4 import BeautifulSoup

URL = "https://www.monster.at/jobs/suche/?q=Software-Devel&where=Graz"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

name_box = soup.findAll("div", attrs={"class": "company"})
print (name_box)

但是使用此代码/网页，我只能获得None作为返回值

# https://www.freecodecamp.org/news/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe/

import requests
from bs4 import BeautifulSoup

URL = "https://www.bloomberg.com/quote/SPX:IND"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")


name_box = soup.find("h1", attrs={"class": "companyName__99a4824b"})
print (name_box)

那是为什么？

（起初，我认为由于第二个网页“ companyName__99a4824b”上该类中的数字，它会动态更改该类名-并非如此-当我刷新网页时，它仍然是相同的类名...）< / p>

Answer 1

得到None的原因是，当用户在页面上时，彭博页面使用Java脚本加载其内容。

BeautifulSoup只是将返回页面的html信息返回给您-其中不包含companyName_99a4824b类标记。

只有在用户等待页面完全加载后，HTML才会包含所需的标记。

如果您要抓取这些数据，则需要使用Selenium之类的字词，您可以指示它等待页面所需的元素准备就绪。

Answer 2

网站阻止了刮板，请检查标题：

print(soup.find("title"))

要绕过此操作，必须使用可以运行JavaScript的真实浏览器。名为Selenium的工具可以为您做到这一点。

网站抓取/ Beautifulsoup /有时不返回？

2 个答案: