Question

大家好......我想阅读http://www.nydailynews.com/上的“最热门”专栏。

Chrome中的代码如下所示：

enter image description here

所以我这样做：

url = "http://www.nydailynews.com/"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

print soup.find_all(id = 'most-read-content')

但它什么也没有返回。

这里有什么问题？是因为“最受欢迎的人”实际上是闪光还是什么？

感谢。

Answer 1

问题从较早开始，下载实际文本。按照您的代码，page.read() 返回空白结果

页面的第一行源代码包含content="text/html; charset=utf-8"，但要么不是这样，要么代码未设置为读取utf-8

Answer 2

＆＃34;问题在于服务器返回Gzip压缩的数据。＆＃34;

以下参考：