使用python urllib2收到工作URL的404错误

时间:2017-06-01 00:40:17

标签: python http-status-code-404 urllib2

我正在尝试获取以下网址:ow dot ly / LApK30cbLKj正在运行,但我收到了http 404错误:

            my_url = 'ow' + '.ly/LApK30cbLKj'     # SO won't accept an ow.ly url
            headers = {'User-Agent' : user_agent } 
            request = urllib2.Request(my_url,"", headers)

            response = None
            try: 
                response = urllib2.urlopen(request)
            except urllib2.HTTPError, e:
                print '+++HTTPError = ' + str(e.code)

当我在浏览器中访问时,我能做些什么才能获得具有http 200状态的URL?

3 个答案:

答案 0 :(得分:0)

您的示例适用于我,除非您需要添加http://

my_url = 'http://ow' + '.ly/LApK30cbLKj'

答案 1 :(得分:0)

您需要定义网址协议,问题是当您在浏览器中访问网址时,默认协议将是HTTP。但是,urllib2不能为您执行此操作,您需要在url的开头添加http://,否则将引发错误:

ValueError: unknown url type: ow.ly/LApK30cbLKj

答案 2 :(得分:0)

正如@enjoi所提到的,我使用了请求:

import requests

result = None
            try:
                result = requests.get(agen_cont.source_url)
            except requests.exceptions.Timeout as e:
                print '+++timeout exception: ' 
                print e
            except requests.exceptions.TooManyRedirects as e:
                print '+++ too manuy redirects exception: ' 
                print e
            except requests.exceptions.RequestException as e:
                print '+++ request exception: ' 
                print e
            except Exception:
                import traceback
                print '+++generic exception: ' + traceback.format_exc()

            if result:
                final_url = result.url
                print final_url
                response = result.content