Question

我在Windows 7上使用Python 3.4运行Beautiful Soup 4.5。这是我的脚本：

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'https://scholar.google.com'
response = http.request('GET', url)
html2 = response.read()
soup = BeautifulSoup([html2])

print (type(soup))

以下是我遇到的错误：

TypeError：预期的字符串或缓冲区

我已经研究过，似乎没有任何修复，除了去一个我不想做的美国汤的旧版本。任何帮助将不胜感激。

Answer 1

不确定为什么要将html字符串放入列表中：

soup = BeautifulSoup([html2])

将其替换为：

soup = BeautifulSoup(html2)

或者，您也可以传递类似响应文件的对象，BeautifulSoup会为您阅读：

response = http.request('GET', url)
soup = BeautifulSoup(response)

明确指定解析器：

也是一个好主意

soup = BeautifulSoup(html2, "html.parser")

Beautfil汤错误与简单的脚本

1 个答案: