我需要为一些输出大量结果的查询累积许多树的结果。由于所有树都可以独立处理,因此除了结果需要求和之外我无法将所有树的中间结果存储在内存中之外,它是令人尴尬的并行。下面是一个问题代码的简单示例,它将所有中间结果保存在内存中(当然,在真正的问题中函数更新,因为这样做会重复工作)。
import numpy as np
from joblib import Parallel, delayed
functions=[[abs,np.round] for i in range(500)] # Dummy functions
functions=[function for sublist in functions for function in sublist]
X=np.random.normal(size=(5,5)) # Dummy data
def helper_function(function,X=X):
return function(X)
results = Parallel(n_jobs=-1,)(
map(delayed(helper_function), [functions[i] for i in range(1000)]))
results_out = np.zeros(results[0].shape)
for result in results:
results_out+=result
解决方案可能是以下修改:
import numpy as np
from joblib import Parallel, delayed
functions=[[abs,np.round] for i in range(500)] # Dummy functions
functions=[function for sublist in functions for function in sublist]
X=np.random.normal(size=(5,5)) # Dummy data
results_out = np.zeros(results[0].shape)
def helper_function(function,X=X,results=results_out):
result = function(X)
results += result
Parallel(n_jobs=-1,)(
map(delayed(helper_function), [functions[i] for i in range(1000)]))
但这可能会导致比赛。所以这不是最佳的。
您是否有任何建议可以在不存储中间结果的情况下进行预处理而仍然保持平行?
答案 0 :(得分:0)
with Parallel(n_jobs=2) as parallel:
accumulator = 0.
n_iter = 0
while accumulator < 1000:
results = parallel(delayed(sqrt)(accumulator + i ** 2)
for i in range(5))
accumulator += sum(results) # synchronization barrier
n_iter += 1
您可以在块中进行计算并减少块,因为您将要耗尽内存。