Question

我试图在python中确定请求模块的错误处理，以便在URL不可用时（即HTTPError，ConnectionError，Timeout等）得到通知，并在不可用时通知它。

我遇到的问题是，即使在FAKE URL上，我似乎也得到了200个状态响应

我已经通过S.O进行了拖网捕捞。与其他各种网络资源一起，尝试了看似试图实现同一目标的许多不同方法，但到目前为止却是空洞的。

为了简化操作，我将代码简化为基本的代码。

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    r = requests.get(url,timeout=1)
    try:
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

我希望列表中的前三个URL被归类为'Website Error:'，因为它们是我刚刚组成的URL。列表中的最终URL很明显是真实的，因此应该是唯一被列为'Website Good:'

的URL。

正在发生的情况是，第一个URL对代码的响应正确，因为它给出的响应代码为503，但是根据status_code，后两个URL根本不产生https://httpstatus.io/，而仅与ERROR一起显示Cannot find URI. another-fake-website.com another-fake-website.com:80

因此，我希望列表中除最后一个URL之外的所有URL都显示为'Website Error:'

输出

在Raspberry Pi中运行脚本时

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Error: ', 'http://fake-website.com', <Response [503]>)
('Website Good: ', 'http://another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://yet-another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://google.com', <Response [200]>)
>>>

如果我在https://httpstatus.io/中输入所有4个URL，则会得到以下结果：

它显示503、200和两个URL，它们没有状态码，而只是显示错误

更新

所以我想我会在Windows中使用PowerShell进行检查，并遵循以下示例： https://stackoverflow.com/a/52762602/5251044

这是下面的输出

c:\Testing>powershell -executionpolicy bypass -File .\AnyName.ps1
0 - http://fake-website.com
200 - http://another-fake-website.com
200 - http://yet-another-fake-website.com
200 - http://google.com

如您所见，我没有前进的方向。

更新2

已经与Fozoro HERE进行了进一步的讨论，并尝试了各种选项，但没有发现任何修复，我认为我将使用urllib2而不是requests来尝试这段代码

这是更改后的代码

from urllib2 import urlopen
import socket

urls = ['http://another-fake-website.com',
        'http://fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com',
        'dskjhkjdhskjh.com',
        'doioieowwros.com']

for url in urls:

    try:
        r  = urlopen(url, timeout = 5)
        r.getcode()
    except:
        pass
    if r.getcode() != 200:
        print ("Website Error: ", url, r.getcode())
    else:
        print ("Website Good: ", url, r.getcode())

不幸的是，生成的输出仍然不正确，但确实与先前代码的输出略有不同，请参见下文：

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Good: ', 'http://another-fake-website.com', 200)
('Website Good: ', 'http://fake-website.com', 200)
('Website Good: ', 'http://yet-another-fake-website.com', 200)
('Website Good: ', 'http://google.com', 200)
('Website Good: ', 'dskjhkjdhskjh.com', 200)
('Website Good: ', 'doioieowwros.com', 200)
>>>

这次显示的是所有200响应，非常奇怪。

Answer 1

对我来说，原因原来是我的 ISP 提供的关于 URL 无效的网站 - 这是那个返回 200 的网站，而不是假的。

这可以通过使用 requests.get('http://fakesite').text 打印返回站点的内容来验证

Answer 2

您应将r = requests.get(url,timeout=1)放在try:块的内部。因此您的代码应如下所示：

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    try:
        r = requests.get(url,timeout=1)
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

输出：

Website Error:  http://fake-website.com <Response [503]>
Website Error:  http://another-fake-website.com <Response [503]>
Website Error:  http://yet-another-fake-website.com <Response [503]>
Website Good:  http://google.com <Response [200]>

我希望这会有所帮助！

URL不存在时Python请求模块中的错误处理

2 个答案: