Question

我将此代码段作为python代码的一部分来抓取特定网站（请参阅下面的代码）。但令我惊讶的是输出代码不是html。我正在使用python 3.4

   import urllib.request as ur
   user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
   headers = { 'User-Agent' : user_agent }

   s = ur.urlopen('http://www.nairaland.com')
   pl = s.read()
   print(pl)

此代码的输出是：

B ''

而不是预期的html代码。请指导我使用此代码。我需要在代码的另一部分中使用html代码。提前谢谢。

Answer 1

优秀的requests库会返回正确的HTML：

import requests
s = requests.get('http://www.nairaland.com')
pl = s.text
print(pl)

Answer 2

实际上，根据This Link，这可能是您的标题的问题。

尝试：

>>> request = urllib2.Request(url, headers={'accept': '*/*'})
>>> urllib2.urlopen(request).read()

我的python代码输出错误的html数据

2 个答案: