Question

我对网络报废很新。我读了关于BeautifulSoup并尝试使用它。但我无法提取给定类名“company-desc-and-sort-container”的文本。我甚至无法从html页面中提取标题。这是我试过的代码：

from BeautifulSoup import BeautifulSoup
import requests

url= 'http://fortune.com/best-companies/'    
r = requests.get(url)

soup = BeautifulSoup(r.text)

#print soup.prettify()[0:1000]
print soup.find_all("title")

letters = soup.find_all("div", class_="company-desc-and-sort-container")

我收到以下错误：

 print soup.find_all("title")
TypeError: 'NoneType' object is not callable

Answer 1

您正在使用['tos', 'lat']版本3 ，这不仅会被维护，而且也没有BeautifulSoup方法。并且，由于点符号用作find_all()的快捷方式，find()尝试使用＆＃34; find_all＆＃34;来查找元素。标记名称，其结果为BeautifulSoup。然后，它将执行None，结果为：

TypeError：＆＃39; NoneType＆＃39;对象不可调用

升级到None("title")版本4，替换：

BeautifulSoup

使用：

from BeautifulSoup import BeautifulSoup

确保已安装from bs4 import BeautifulSoup软件包：

beautifulsoup4

Answer 2

soup.find_all("title")

找不到标题标签并返回＆＃34;无＆＃34;。还有＆＃34; find_all＆＃34;如果找到某些内容，方法将返回一个列表，您将收到不同的错误。您无法打印列表。只使用＆＃34; find＆＃34;方法。那将是第一个标题标签。

然后html页面甚至有标题标签？搜索，仅在不打印时打印。

无法从python中的html页面中提取文本

2 个答案: