我使用了并行化的大型CPU密集型数据处理任务
concurrent.futures ProcessPoolExecutor
方法如下所示。
with concurrent.futures.ProcessPoolExecutor(max_workers=workers) as executor:
futures_ocr = ([
executor.submit(
MyProcessor,
folder
) for folder in sub_folders
])
is_cancel = wait_for(futures_ocr)
if is_cancel:
print 'shutting down executor'
executor.shutdown()
def wait_for(futures):
"""Handes the future tasks after completion"""
cancelled = False
try:
for future in concurrent.futures.as_completed(futures, timeout=200):
try:
result = future.result()
print 'successfully finished processing folder: ', result.source_folder_path
except concurrent.futures.TimeoutError:
print 'TimeoutError occured'
except TypeError:
print 'TypeError occured'
except KeyboardInterrupt:
print '****** cancelling... *******'
cancelled = True
for future in futures:
future.cancel()
return cancelled
某些文件夹中的进程似乎长时间停留,不是因为代码中的某些错误,而是由于正在处理的文件的性质。所以,我想超时这些类型的进程,以便在超过某个时间限制时返回。然后,池可以将该过程用于下一个可用任务。
在as_completed()
函数中添加超时会在完成时出错。
Traceback (most recent call last):
File "call_ocr.py", line 96, in <module>
main()
File "call_ocr.py", line 42, in main
is_cancel = wait_for(futures_ocr)
File "call_ocr.py", line 59, in wait_for
for future in concurrent.futures.as_completed(futures, timeout=200):
File "/Users/saurav/.pyenv/versions/ocr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 216, in as_completed
len(pending), len(fs)))
concurrent.futures._base.TimeoutError: 3 (of 3) futures unfinished
我在这里做错了什么,以及导致超时流程停止并将流程放回流程池的最佳方法是什么?
答案 0 :(得分:0)
Assert.assertEquals("\\u0041\\u0042\\u0043\\u0044", result);
实现不支持此类用例。
可以传递给它的函数和方法的UPDATE CUST_VW
SET A.CUST_ID=B.CUST_ID
WHERE A.REQUEST_NUM = B.REQUEST_NUM AND B.CUST_ID!= NULL;
Sample Data
request_num|Customer id
12 | ANBZ
12 |
12 |
13 |
13 | xyz
允许设置等待结果的时间,但对实际计算本身没有影响。
pebble库支持此类用例。
concurrent.futures