Question

目前正在尝试获取url的html，并且遇到python中的请求模块引发的错误。

处理请求引发的TooManyRedirects错误的首选方法是什么？如何访问网站的HTML？

site = requests.get("http://www.hortonworks.com/blog/data-science-apacheh-hadoop-predicting-airline-delays")

Answer 1

禁止重定向的正常方法是使用allow_redirects=False 例如，

site = requests.get(url,allow_redirects=False)

但这是不解决问题的方法，

添加用户代理可以解决重定向问题并成功获取页面源。

试试这个，

headers={"User-Agent":"Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"}

url="http://www.hortonworks.com/blog/data-science-apacheh-hadoop-predicting-airline-delays"

site = requests.get(url,headers=headers)    
print site.url

-

Out[]: 'http://hortonworks.com/blog/data-science-apacheh-hadoop-predicting-airline-delays/'

TooManyRedirects请求模块出错

1 个答案: