我正在尝试获取以下网址:ow dot ly / LApK30cbLKj正在运行,但我收到了http 404错误:
my_url = 'ow' + '.ly/LApK30cbLKj' # SO won't accept an ow.ly url
headers = {'User-Agent' : user_agent }
request = urllib2.Request(my_url,"", headers)
response = None
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
print '+++HTTPError = ' + str(e.code)
当我在浏览器中访问时,我能做些什么才能获得具有http 200状态的URL?
答案 0 :(得分:0)
您的示例适用于我,除非您需要添加http://
my_url = 'http://ow' + '.ly/LApK30cbLKj'
答案 1 :(得分:0)
您需要定义网址协议,问题是当您在浏览器中访问网址时,默认协议将是HTTP。但是,urllib2不能为您执行此操作,您需要在url的开头添加http://
,否则将引发错误:
ValueError: unknown url type: ow.ly/LApK30cbLKj
答案 2 :(得分:0)
正如@enjoi所提到的,我使用了请求:
import requests
result = None
try:
result = requests.get(agen_cont.source_url)
except requests.exceptions.Timeout as e:
print '+++timeout exception: '
print e
except requests.exceptions.TooManyRedirects as e:
print '+++ too manuy redirects exception: '
print e
except requests.exceptions.RequestException as e:
print '+++ request exception: '
print e
except Exception:
import traceback
print '+++generic exception: ' + traceback.format_exc()
if result:
final_url = result.url
print final_url
response = result.content