Question

html = urlopen(url)
bs = BeautifulSoup(html.read(), 'html5lib')

运行几次后，进程陷入BeautifulSoup(html.read(), 'html5lib')，我试图从html解析器更改为'lxml'和'html.parser'。但问题仍然存在。 BeautifulSoup中有错误吗？我该如何解决这个问题？

更新我在程序中添加了一些日志，比如这个

print('open the url')
html = urlopen(url)
print('create BeautifulSoup Object')
bs = BeautifulSoup(html.read(), 'html5lib')

控制台打印create BeautifulSoup Object，只需用闪烁的光标停留在那里。

Answer 1

我遇到了同样的问题，我发现该程序卡在html.read()，这可能是因为urlopen()资源在响应出现错误时没有正确关闭。

您可以这样更改：

with urlopen(url) as html:
    html = html.read()
bs = BeautifulSoup(html, "lxml")

或者您可以选择使用requests套餐，这比urllib更好：

import requests

html = requests.get(url).text
bs = BeautifulSoup(html, "lxml")

希望它可以解决您的问题