我有一些Python代码使用ThreadPoolExecutor来销售昂贵的工作,我想跟踪哪些已经完成,所以如果我必须重新启动这个系统,我不必重做已经完成的东西。在单线程环境中,我可以标记我在架子上所做的事情。这是多线程环境中这个想法的天真端口:
from concurrent.futures import ThreadPoolExecutor
import subprocess
import shelve
def do_thing(done, x):
# Don't let the command run in the background; we want to be able to tell when it's done
_ = subprocess.check_output(["some_expensive_command", x])
done[x] = True
futs = []
with shelve.open("done") as done:
with ThreadPoolExecutor(max_workers=18) as executor:
for x in things_to_do:
if done.get(x, False):
continue
futs.append(executor.submit(do_thing, done, x))
# Can't run `done[x] = True` here--have to wait until do_thing finishes
for future in futs:
future.result()
# Don't want to wait until here to mark stuff done, as the whole system might be killed at some point
# before we get through all of things_to_do
我能逃脱这个吗? documentation for shelve不包含任何有关线程安全的保证,因此我不这么认为。
那么处理这个问题的简单方法是什么?我认为也许在done[x] = True
中坚持future.add_done_callback
会做到这一点,但that will often run in the same thread as the future itself。也许有一个与ThreadPoolExecutor很好地配合的锁定机制?对我来说,写一个睡眠然后检查已完成的未来的循环对我来说似乎更清晰。
答案 0 :(得分:1)