Question

我正在使用Python3，BeautifulSoup4

当我在下面运行代码时，它只提供网址“www.google.com”而不是XML。我找不到它有什么不对。

from bs4 import BeautifulSoup
import urllib


html = "www.google.com"

soup = BeautifulSoup(html)


print (soup.prettify())

Answer 1

您需要使用urllib2或类似的库来获取HTML

import urllib2
html = urllib2.urlopen("www.google.com")

soup = BeautifulSoup(html)

print (soup.prettify())

编辑：正如旁边的说明澄清我为什么建议urllib2。如果您阅读了urllib文档，您会发现“在Python 3中已删除了urlopen（）函数，而使用了urllib2.urlopen（）。”鉴于您已经标记了Python3，urllib2可能是您的最佳选择。