我使用ThreadPoolExecutor
并且需要在任何工作线程失败的情况下中止整个计算。
示例1。无论错误如何,都会打印成功,因为ThreadPoolExecutor
不会自动重新引发异常。
from concurrent.futures import ThreadPoolExecutor
def task():
raise ValueError
with ThreadPoolExecutor() as executor:
executor.submit(task)
print('Success')
示例2。这会正确地崩溃主线程,因为.result()
会重新引发异常。但它等待第一个任务完成,因此主线程会延迟发生异常。
import time
from concurrent.futures import ThreadPoolExecutor
def task(should_raise):
time.sleep(1)
if should_raise:
raise ValueError
with ThreadPoolExecutor() as executor:
executor.submit(task, False).result()
executor.submit(task, True).result()
print('Success')
如何在主线程(或多或少)发生后立即发现工作者异常,以处理故障并中止剩余的工作人员?
答案 0 :(得分:2)
首先,我们必须在请求结果之前提交任务。否则,线程甚至不会并行运行:
futures = []
with ThreadPoolExecutor() as executor:
futures.append(executor.submit(good_task))
futures.append(executor.submit(bad_task))
for future in futures:
future.result()
现在我们可以将异常信息存储在主线程和工作线程都可用的变量中:
exc_info = None
主线程无法真正杀死其子进程,因此我们让工作人员检查要设置的异常信息并停止:
def good_task():
global exc_info
while not exc_info:
time.sleep(0.1)
def bad_task():
global exc_info
time.sleep(0.2)
try:
raise ValueError()
except Exception:
exc_info = sys.exc_info()
在所有线程终止后,主线程可以检查保存异常信息的变量。如果它已填充,我们会重新提出异常:
if exc_info:
raise exc_info[0].with_traceback(exc_info[1], exc_info[2])
print('Success')
答案 1 :(得分:1)
我想,我会像那样实现它:
我是主流程,我创建了2个队列:
::
import multiprocessing as mp
error_queue = mp.Queue()
cancel_queue = mp.Queue()
我创建每个ThreadPoolExecutor
,并将这些队列作为参数传递。
class MyExecutor(concurrent.futures.ThreadPoolExecutor):
def __init__(self, error_queue, cancel_queue):
self.error_queue : error_queue
self.cancel_queue = cancel_queue
每个ThreadPoolExecutor
都有一个主循环。在此循环中,我首先扫描cancel_queue
以查看是否有“取消”消息。
在主循环中,我还实现了一个异常管理器。如果出现错误,我会提出异常:
self.status = "running"
with True: # <- or something else
if not self.cancel_queue.empty():
self.status = "cancelled"
break
try:
# normal processing
...
except Exception as exc:
# you can log the exception here for debug
self.error_queue.put(exc)
self.status = "error"
break
time.sleep(.1)
在主要过程中:
运行所有MyExecutor
实例。
扫描error_queue:
while True:
if not error_queue.empty():
cancel_queue.put("cancel")
time.sleep(.1)