Question

我有一个网址列表

我使用以下内容检索其内容：

for url in url_list:
    req = urllib2.Request(url)
    resp = urllib2.urlopen(req, timeout=5)
    resp_page = resp.read()
    print resp_page

当超时时，程序崩溃。如果有socket.timeout: timed out，我只想阅读下一个网址。怎么做？

由于

Answer 1

虽然已经有答案，但我想指出URLlib2可能不是唯一对此行为负责的人。

正如here指出的那样（并且它似乎也基于问题描述），该异常可能属于socket库。

在这种情况下，只需添加另一个except：

import socket

try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Bad URL or timeout"
except socket.timeout:
    print "socket timeout"

Answer 2

我将继续并假设“崩溃”是指“引发URLError”，如urllib2.urlopen docs所述。请参阅Python教程的Errors and Exceptions部分。

for url in url_list:
    req = urllib2.Request(url)
    try:
        resp = urllib2.urlopen(req, timeout=5)
    except urllib2.URLError:
        print "Bad URL or timeout"
        continue # skips to the next iteration of the loop
    resp_page = resp.read()
    print resp_page

Answer 3

听起来你只需要捕获超时异常。我没有收到你执行的socket.timeout消息。

req = urllib2.Request("http://127.0.0.2")
try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Timeout!"

显然，你需要一个实际超时的URL（127.0.0.2可能不在你的盒子上）。

如果超时则跳过URL

3 个答案: