在某些类别的数据管道中,等待一些外部进程完成(例如)观察文件是否写入是很有用的。
在dask中实现这个天真会导致长时间运行的任务在整个持续时间内阻塞工作者。
def wait_for_file(filename='some_filename', max_wait_time=600):
start_time = time.time()
while True:
if time.time() - start_time > max_wait_time:
raise Exception('Timeout')
if exists(filename):
return filename
time.sleep(0.1)
file_exists = delayed(wait_for_file)()
res = delayed(process_file)(file_exists)
如何使此代码不阻止工作者
答案 0 :(得分:1)
使用http://dask.pydata.org/en/latest/futures.html#submit-tasks-from-tasks中提到的secede
和rejoin
,您可以按如下方式编写此等待功能
def wait_for_file(filename='some_filename', max_wait_time=600):
start_time = time.time()
# detach from the scheduler
distributed.secede()
try:
while True:
if time.time() - start_time > max_wait_time:
raise Exception('Timeout')
if exists(filename):
# rejoin to the pool of dask executor threads and return
distributed.rejoin()
return filename
time.sleep(0.1)
finally:
# in the case where something goes wrong you want to rejoin
# so that your client knows that this function call failed
distributed.rejoin()