应用错误收集

我正在编写一个迭代算法，其中最耗时的部分是执行函数oneiter()，如下所示：

def oneiter(M,h):
    res = []
    for i in range(M.shape[2]):
        res.append(f(M[:,:,i],h))
    return res

其中M是n n由n numpy数组组成的大n，而f是使用M[:,:,i]和h进行某种回归的函数。在所有迭代中，M保持不变，h可以不同。

为了加快速度，我尝试使用joblib在for中并行化oneiter循环：

Parallel(n_jobs=4)(delayed(f)(M[:,:,i],h) for i in range(M.shape[2]))

这种方式变得更慢。我是编写并行化python代码的新手。有人能说出为什么会这样，有什么更好的办法吗？