Question

我在python中有一个令人尴尬的并行循环（重复n次），每次迭代执行一个复杂的任务并返回numpy数组和dict的混合（所以没有一个数字填充到数组中 - 否则现在认为它们是复杂的束。重复不需要按任何特定顺序进行 - 我只需要能够唯一地识别i次迭代中的每个n（例如，为了在重复中独立保存结果）。事实上，他们甚至不需要通过索引/计数器进行识别，而是需要订购的独特内容（我可以轻松地将它们填充回更大的数组中。）

为了给出一个更具体的例子，我想并行执行以下任务：

def do_complex_task(complex_input1, input2, input3, input_n):
  "all important computation done here - independent of i or n"

  inner_result1, inner_result2 = np.zeros(100), np.zeros(100)
  for smaller_input in complex_input1:
    inner_result1 = do_another_complex_task(smaller_input, input2, input3, input_n)
    inner_result2 = do_second_complex_task(smaller_input, input2, input3, input_n)

  # do some more to produce few more essential results
  dict_result = blah()

  unique_identifier = get_unique_identifier_for_this_thread() # I don't know how

  # save results for each repetition independently before returning, 
  # instead of waiting for full computation to be done which can take a while
  out_path = os.path.join(out_dir, 'repetition_{}.pkl'.format(unique_identifier))

  return inner_result1, inner_result2, inner_result_n, dict_result


def main_compute()
  "main method to run the loop"

  n = 256 # ideally any number, but multiples of 4 possible, for even parallelization.

  result1  = np.zeros([n, 100])
  result2  = np.zeros([n, 100])
  result_n = np.zeros([n, 100])
  dict_result = list()

  # this for loop does not need to be computed in any order (range(n) is an illustration)
  # although this order would be ideal, as it makes it easy to populate results into a bigger array
  for i in range(n):
    # this computation has nothing to do with i or n!
    result1[i, :], result2[i, :], result_n[i, :], dict_result[i] = do_complex_task(complex_input1, input2, input3, input_n)

  # I need to parallelize the above loop to speed up stupidly parallel processing.


if __name__ == '__main__':
    pass

我阅读得相当广泛，并且不清楚哪种策略更智能，更简单，没有任何可靠性问题。

complex_input1也可能很大 - 所以我不喜欢大量的I / O开销和酸洗。

我当然可以返回一个列表（包含所有复杂的部分），它会被附加到主列表中，以后可以组装成我喜欢的格式（矩形数组等）。例如，可以使用joblib轻松完成此操作。但是，我试图向大家学习，以找出好的解决方案。

编辑：我想我正在解决以下问题。让我知道它可能出现什么问题或如何在速度，无副作用等方面进一步改进。在我的笔记本电脑上进行少量非结构化试验后，目前尚不清楚是否有明确的加速因此。

from multiprocessing import Pool, Manager
chunk_size = int(np.ceil(num_repetitions/num_procs))
with Manager() as proxy_manager:
    shared_inputs = proxy_manager.list([complex_input1, input2, another, blah])
    partial_func_holdout = partial(key_func_doing_work, *shared_inputs)

    with Pool(processes=num_procs) as pool:
        results = pool.map(partial_func_holdout, range(num_repetitions), chunk_size)

Answer 1

有multiprocessing.Pool.map

形式的内置解决方案

import multiprocessing
from functools import partial

def do_task(a, b):
    return (42, {'x': a * 2, 'y': b[::-1]})

if __name__ == '__main__':
    a_values = ['Hello', 'World']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.map(partial(do_task, b='fixed b value'), a_values)
    print(results)

在此之后，results将按照与a_values相同的顺序包含结果。

要求是参数和返回值是Pickle'able。除此之外它们可能很复杂，但如果它是大量数据，可能会有一些性能损失。

我不知道这是否是您认为好的解决方案;我已经多次使用它，它对我很有用。

你可以把返回值放在一个类中，但我个人觉得这并不能带来好处，因为Python没有静态类型检查。

它只是并行启动＃processes个作业。它们应该是独立的，顺序无关紧要（我认为它们是按照提供的顺序启动的，但它们可能以另一种顺序完成）。

基于this answer的示例。

在每次迭代中具有复杂输出的循环令人尴尬地并行

1 个答案: