我正在构建一个Python脚本,在我的数据库中搜索所有URL,然后按照URL查找损坏的链接。此脚本需要使用异常处理来记录何时遇到打开链接的错误,但是它开始遇到错误,我已经完全无法为以下内容编写except语句:
Traceback (most recent call last):
File "exceptionerror.py", line 97, in <module>
raw_response = response.read().decode('utf8', errors='ignore')
File "/usr/lib/python3.4/http/client.py", line 512, in read
s = self._safe_read(self.length)
File "/usr/lib/python3.4/http/client.py", line 662, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
我尝试了以下内容:
except SocketError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
continue
和
except ConnectionResetError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
continue
甚至是一个完整的通用异常,试图捕获所有错误,只是因为它不会杀死整个脚本:
except:
print("This link was not caught by defined exceptions: " + articlelinks[j])
continue
我完全失去了如何让我的脚本捕获此错误,以便它可以继续检查损坏的链接而不是硬故障。这是断断续续的,所以我不相信链接被破坏了,我觉得即使我已经识别了URL,只需抓住它并在手边跳过它就是作弊,因为我的目标是正确处理异常。有人可以告诉我如何处理这个例外吗?
供参考,这是我的完整循环:
for j in range(0, len(articlelinks)):
try:
req=urllib.request.Request(articlelinks[j], None, {'User-agent' : 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'})
response = urllib.request.urlopen(req)
except urllib.request.HTTPError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + format(inst) + ', ' + brokenlinks
continue
except TimeoutError:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' Timeout Error, ' + brokenlinks
continue
except urllib.error.URLError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + format(inst) + ', ' + brokenlinks
continue
except SocketError as inst:
brokenlinksflag = 1
brokenlinks = articlelinks[j] + ' ' + sys.exc_info()[0] + ', ' + brokenlinks
continue
except:
print("This article killed everything: " + articlelinks[j])
exit()
答案 0 :(得分:4)
解决!问题是我正在对连接进行故障排除以处理ConnectionResetError,但是,仔细检查完整错误表明错误是通过尝试处理响应而不是打开URL来引发的:
Popen()
由于连接已重置,而不是完全终止,因此脚本能够成功打开URL,并且在尝试解码响应时生成错误,这意味着try / except条件位于错误的行周围。< / p>
以下解决了这个问题:
File "exceptionerror.py", line 97, in <module>
raw_response = response.read().decode('utf8', errors='ignore')