Question

我有一个简单的函数（在python 3中）来获取URL并尝试解析它：如果有错误代码（例如404）则打印错误代码或将其中一个缩短的URL解析为其完整URL。我的URL位于csv文件的一列中，输出保存在下一列中。问题出现在程序遇到服务器需要太长时间响应的URL时 - 程序崩溃。如果服务器耗时太长，是否有一种简单的方法可以强制urllib打印错误代码。我调查了Timeout on a function call，但这看起来有点过于复杂，因为我刚开始。有什么建议吗？

即。（COL A）shorturl（COL B）http://deals.ebay.com/500276625

received: function(data) {
    // Called when there's incoming data on the websocket for this channel
    $('.message-append').append(data.message);
  },

  listen_to_messages: function() {
    return this.perform('listen', {
      room_id: $("[data-room-id]").data("room-id")
    });
  }
});

$(document).on('turbolinks:load', function() {
  App.room.listen_to_messages();
});

编辑：如果有人收到http.client.disconnected错误（像我一样），请参阅此问题/答案http.client.RemoteDisconnected error while reading/parsing a list of URL's

Answer 1

查看docs：

urllib.request.urlopen(url, data=None[, timeout])
可选的timeout参数指定阻塞操作（如连接尝试）的超时（以秒为单位）（如果未指定，将使用全局默认超时设置）。

您可以为流程设置真实的timeout（以秒为单位）：

conn = urllib.request.urlopen(urlColumnElem, timeout=realistic_timeout_in_seconds)

为了让你的代码停止破碎，移动try except块内的所有内容：

import socket

def urlparse(urlColumnElem):
    try:
        conn = urllib.request.urlopen(
                   urlColumnElem, 
                   timeout=realistic_timeout_in_seconds
               )
        redirect=conn.geturl()
        #check redirect
        if(redirect == urlColumnElem):
            #print ("same: ")
            #print(redirect)
            return (redirect)
        else:
            #print("Not the same url ")
            return(redirect)

    except urllib.error.HTTPError as e:
        return (e.code)
    except urllib.error.URLError as e:
        return ('URL_Error')
    except socket.timeout as e:
        return ('Connection timeout')

现在，如果发生超时，您将捕获异常并且程序不会崩溃。

祝你好运：）

Answer 2

首先，有一个超时参数可用于控制urlopen允许的时间。接下来urlopen中的超时应该抛出异常，更准确地说是socket.timeout。如果你不想让它中止程序，你只需抓住它：

def urlparse(urlColumnElem, timeout=5):   # allow 5 seconds by default
    try:
        conn = urllib.request.urlopen(urlColumnElem, timeout = timeout)
    except urllib.error.HTTPError as e:
        return (e.code)
    except urllib.error.URLError as e:
        return ('URL_Error')
    except socket.timeout:
        return ('Timeout')
    else:
        ...

当页面响应时间过长时，urllib请求失败

2 个答案: