我正在使用requests库来捕获重定向的URL。 让我用以下代码演示这一点:
import requests
try:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',}
response = requests.get('https://www.mooc-list.com/go.php?courseId=3502', timeout=3, headers=headers)
response.raise_for_status()
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
print ("Oops : Something Else",err)
然后我得到以下输出:
错误连接:HTTPSConnectionPool(host ='hub0.ecolearning.eu',端口= 443):url超出了最大重试次数:/ course / smooc-step-by-step-2ed /(由ConnectTimeoutError(,'Connection引起到hub0.ecolearning.eu超时。(连接超时= 3)'))
但是,当我尝试通过执行print(response.url)
打印URL时,出现名称错误
NameError:未定义名称“响应”
这基本上意味着response
对象在连接失败时未初始化,因此我无法捕获URL历史记录或重定向。
我不介意连接失败,但是我想提取URL重定向。有什么解决办法吗?
谢谢! :)
答案 0 :(得分:1)
如果我们无法打开重定向到的网址。我们可以尝试从http标头中找到Location
。因此,我选择在requests
上停止自动重定向,并建立一个新的重定向器
import requests
def Final_location(url):
try:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',}
response = requests.get(url, timeout=3 , allow_redirects=False , headers=headers)
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
return url
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
return url
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
return url
except requests.exceptions.RequestException as err:
print ("Oops : Something Else",err)
return url
if response.headers.get("Location"):
return Final_location(response.headers.get("Location"))
else:
return response.url
#Location = Final_location(response.headers.get("Location")) if response.headers.get("Location") else response.url
#return Location
print(Final_location('https://www.mooc-list.com/go.php?courseId=3502'))
输出:
Error Connecting: HTTPSConnectionPool(host='hub0.ecolearning.eu', port=443): Max retries exceeded with url: /course/smooc-step-by-step-2ed/ (Caused by ProxyError('Cannot connect to proxy.', timeout('_ssl.c:817: The handshake operation timed out',)))
https://hub0.ecolearning.eu/course/smooc-step-by-step-2ed/#
答案 1 :(得分:0)
def ReqGet():
try :
resp = requests.get(url, params=params, headers = headers,timeout=3)
return resp
except Exception as e
return e
res = ReqGet()
print(res.request.url)
Error对象具有最后一个重定向的请求,可以从该请求中获取URL