Question

有些时候我可以有效地处理socket.timeout，虽然有些时候我得到套接字超时错误并且我的脚本突然停止...在我的异常处理中是否有我缺少的东西？它是怎么回事呢？

在以下任一代码中随机发生：

第一个片段：

for _ in range(max_retries):
    try:
        req = Request(url, headers={'User-Agent' :'Mozilla/5.0'})
        response = urlopen(req,timeout=5)
        break
    except error.URLError as err: 
        print("URL that generated the error code: ", url)
        print("Error description:",err.reason)
    except error.HTTPError as err:
        print("URL that generated the error code: ", url)
        print("Error code:", err.code)
        print("Error description:", err.reason)
    except socket.timeout:
        print("URL that generated the error code: ", url)
        print("Error description: No response.")
    except socket.error:
        print("URL that generated the error code: ", url)
        print("Error description: Socket error.")

if response.getheader('Content-Type').startswith('text/html'):
    htmlBytes = response.read()
    htmlString = htmlBytes.decode("utf-8")
    self.feed(htmlString)

第二段摘录

for _ in range(max_retries):
    try:
        req = Request(i, headers={'User-Agent' :'Mozilla/5.0'})
        with urlopen(req,timeout=5) as response, open(aux, 'wb') as out_file:
            shutil.copyfileobj(response, out_file)  
        with open(path, fname), 'a') as f:
            f.write(("link" + str(intaux) + "-" + auxstr + str(index) + i[-4:] + " --- " + metadata[index%batch] + '\n'))
        break
    except error.URLError as err:
        print("URL that generated the error code: ", i)
        print("Error description:",err.reason)
    except error.HTTPError as err:
        print("URL that generated the error code: ", i)
        print("Error code:", err.code)
        print("Error description:", err.reason)
    except socket.timeout:
        print("URL that generated the error code: ", i)
        print("Error description: No response.")
    except socket.error:
        print("URL that generated the error code: ", i)
        print("Error description: Socket error.")

错误：

Traceback (most recent call last):
  File "/mydir/crawler.py", line 202, in <module>
    spider("urls.txt", maxPages=0, debug=1, dailyRequests=9600) 
  File "/mydir/crawler.py", line 142, in spider
    parser.getLinks(url + "?start=" + str(currbot) + "&tab=" + auxstr,auxstr)
  File "/mydir/crawler.py", line 81, in getLinks
    htmlBytes = response.read()
  File "/usr/lib/python3.5/http/client.py", line 455, in read
    return self._readall_chunked()
  File "/usr/lib/python3.5/http/client.py", line 561, in _readall_chunked
    value.append(self._safe_read(chunk_left))
  File "/usr/lib/python3.5/http/client.py", line 607, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 575, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

编辑：

我注意到我错过了几行代码，感谢@tdelaney我将它们添加到上面的代码中并且我发布了我写的解决方案，如果你发布了解决方案或者你有更好的方法来解决它我将答案标记为正确

解决方案：

for _ in range(max_retries):
    try:
        req = Request(url, headers={'User-Agent' :'Mozilla/5.0'})
        response = urlopen(req,timeout=5)
        break
    except error.URLError as err: 
        print("URL that generated the error code: ", url)
        print("Error description:",err.reason)
    except error.HTTPError as err:
        print("URL that generated the error code: ", url)
        print("Error code:", err.code)
        print("Error description:", err.reason)
    except socket.timeout:
        print("URL that generated the error code: ", url)
        print("Error description: No response.")
    except socket.error:
        print("URL that generated the error code: ", url)
        print("Error description: Socket error.")

if response.getheader('Content-Type').startswith('text/html'):
    for _ in range(max_retries):
        try:
            htmlBytes = response.read()
            htmlString = htmlBytes.decode("utf-8")
            self.feed(htmlString)
            break
        except error.URLError as err: 
            print("URL that generated the error code: ", url)
            print("Error description:",err.reason)
        except error.HTTPError as err:
            print("URL that generated the error code: ", url)
            print("Error code:", err.code)
            print("Error description:", err.reason)
        except socket.timeout:
            print("URL that generated the error code: ", url)
            print("Error description: No response.")
        except socket.error:
            print("URL that generated the error code: ", url)
            print("Error description: Socket error.")

Answer 1

python“Requests”库使用自己的异常集来处理与HTTP协议和套接字有关的错误。它会自动将从其嵌入式socket（）函数返回的异常映射到requests.exceptions中定义的自定义异常。

因此提出的例外......

import Requests

try:
    req = Request("http://stackoverflow.com", headers={'User-Agent' :'Mozilla/5.0'})
    urlopen(req,timeout=5)
except Timeout:
    print "Session Timed Out!"

等同于此提出的例外......

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
    s.connect(("127.0.0.1", 80))
except socket.timeout:
    print "Session Timed Out"

您的固定代码......

for _ in range(max_retries):
try:
    req = Request(url, headers={'User-Agent' :'Mozilla/5.0'})
    response = urlopen(req,timeout=5)
    break
except error.URLError as err: 
    print("URL that generated the error code: ", url)
    print("Error description:",err.reason)
except error.HTTPError as err:
    print("URL that generated the error code: ", url)
    print("Error code:", err.code)
    print("Error description:", err.reason)
except Timeout:
    print("URL that generated the error code: ", url)
    print("Error description: Session timed out.")
except ConnectionError:
    print("URL that generated the error code: ", url)
    print("Error description: Socket error timed out.")

Python：Socket.timeout不由except处理

1 个答案: