我正在对大量文件执行一些重复性任务,因此我想并行运行这些任务。
每个任务都在一个看起来像这样的函数中:
def function(file):
...
return var1, var2, ...
我设法使用:p并行运行所有这些
import concurrent.futures as Cfut
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, file) for file in list_files]
Cfut.wait(futures)
我想做的是:
这是我目前写的内容:
def function(files):
for file in files:
...
print('var1', 'var2', ...)
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
如果我只打印var1,var2等,则可以正常工作,但是我需要将这些结果放入数组或其他内容中。
答案 0 :(得分:0)
使用Andrej Kesely的评论和lib多处理程序,我设法使用共享字典编写了一些有用的东西。
def function(files, dic):
for file in files:
...
dic[i] = var1, var2, ...
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
manager = mlp.Manager()
dic = manager.dict()
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param, dic)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
return [dic[i] for i in range(len(dic))]