Question

我是一个绝对的初学者。我尝试使用BeautifulSoup并抓取一个网站。我确实获得了HTML，但我想现在获得所有divs类content_class。

这是我的尝试：

import requests
from BeautifulSoup import BeautifulSoup

#Request the page and parse the HTML
url = 'mywebsite'
response = requests.get(url)
html = response.content

#Beautiful Soup
soup = BeautifulSoup(html)
soup.find_all('div', class_="content_class")

然而，这不起作用。我明白了：

Traceback（最近一次调用最后一次）：文件＆＃34; scrape.py＆＃34;，第11行，in soup.find_all（＆＃39; div＆＃39;，class _ =＆＃34; content_class＆＃34;）TypeError：＆＃39; NoneType＆＃39;对象不可调用

我做错了什么？

Answer 1

您正在使用BeautifulSoup version three，但似乎遵循BeautifulSoup version four的文档。 Element.find_all() method仅适用于最新的主要版本（称为Element.findAll() in version 3）。

我强烈建议你升级：

pip install beautifulsoup4

和

from bs4 import BeautifulSoup

版本3已于2012年停止接收更新;它现在严重过时了。

Answer 2

你收到此错误是因为BeautifulSoup中没有方法“find_all”，有“findAll”方法，这段代码应该有帮助

 soup.findAll('div', {'class': 'content_class'})

网站刮痧与BeautifulSoup：TypeError：＆＃39; NoneType＆＃39;对象不可调用

2 个答案: