捕获多个异常的方法

Question

这是一个网页挖掘脚本。

def printer(q,missing):
    while 1:
        tmpurl=q.get()
        try:
            image=urllib2.urlopen(tmpurl).read()
        except httplib.HTTPException:
            missing.put(tmpurl)
            continue
        wf=open(tmpurl[-35:]+".jpg","wb")
        wf.write(image)
        wf.close()

q是由{urls}组成的Queue()并且`缺少一个空队列来收集错误提升网址

它由10个线程并行运行。

每次我跑这个，我都有。

  File "C:\Python27\lib\socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "C:\Python27\lib\httplib.py", line 541, in read
    return self._read_chunked(amt)
  File "C:\Python27\lib\httplib.py", line 592, in _read_chunked
    value.append(self._safe_read(amt))
  File "C:\Python27\lib\httplib.py", line 649, in _safe_read
    raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(5274 bytes read, 2918 more expected)

但我确实使用except ... 我尝试过像

这样的东西

httplib.IncompleteRead
urllib2.URLError

偶数，

image=urllib2.urlopen(tmpurl,timeout=999999).read()

但这都不起作用..

如何捕捉IncompleteRead和URLError？

Answer 1

我认为这个问题的正确答案取决于您认为是“错误提升网址”。

捕获多个异常的方法

如果您认为任何引发异常的网址应添加到missing队列，那么您可以这样做：

try:
    image=urllib2.urlopen(tmpurl).read()
except (httplib.HTTPException, httplib.IncompleteRead, urllib2.URLError):
    missing.put(tmpurl)
    continue

这将捕获这三个异常中的任何一个，并将该url添加到missing队列中。你可以更简单地做到：

try:
    image=urllib2.urlopen(tmpurl).read()
except:
    missing.put(tmpurl)
    continue

要捕获任何异常，但这不会被视为Pythonic，并且可能会隐藏代码中的其他可能错误。

如果通过“错误提升网址”表示任何引发httplib.HTTPException错误的网址，但如果收到其他错误，您仍然希望继续处理，那么您可以执行以下操作：

try:
    image=urllib2.urlopen(tmpurl).read()
except httplib.HTTPException:
    missing.put(tmpurl)
    continue
except (httplib.IncompleteRead, urllib2.URLError):
    continue

如果missing引发了httplib.HTTPException，则会将该网址添加到httplib.IncompleteRead，但会抓住urllib.URLError和while 1，并防止您的脚本崩溃。

迭代队列

顺便说一句，for tmpurl in iter(q, "STOP"): # rest of your code goes here pass循环对我来说总是有点担心。您应该能够使用以下模式遍历队列内容，尽管您可以继续按照自己的方式继续操作：

with open(tmpurl[-35:]+".jpg","wb") as wf:
    wf.write()

安全地处理文件

除此之外，除非绝对必要，否则您应该使用context managers来打开和修改文件。所以你的三个文件操作行将成为：

<source>
  type tail
  format none
  read_from_head true
  path /path/to/logs/*.log
  pos_file /path/to/logs/pos_file
  tag mylog
</source>

上下文管理器负责关闭文件，即使在写入文件时发生异常也会这样做。

处理IncompleteRead，URLError

1 个答案:

捕获多个异常的方法

迭代队列

安全地处理文件