Question

我在python中做了一些计算量很大的任务，并找到了用于并行化的线程模块。我有一个函数来执行计算并返回一个ndarray作为结果。现在我想知道如何平行我的函数并从每个线程中获取计算出的数组。

通过灯光功能和计算，大大简化了以下示例。

import numpy as np

def calculate_result(input):
    a=np.linspace(1.0, 1000.0, num=10000)   # just an example
    result = input*a
  return(result)

input =[1,2,3,4]

for i in range(0,len(input(i))):
    t.Thread(target=calculate_result, args=(input))
    t. start()  
    #Here I want to receive the return value from the thread

我正在寻找一种从每个线程的线程/函数中获取返回值的方法，因为在我的任务中每个线程计算不同的值。

我发现了另一个问题（how to get the return value from a thread in python?），其中某人正在寻找类似的问题（没有ndarrays），并且使用ThreadPool和异步处理......

----------------------------------------------- --------------------------------

感谢您的回答！由于你现在的帮助，我正在寻找一种方法来解决我的多处理模块的问题。为了让您更好地了解我的操作，请参阅以下说明。

解释

我的'input_data'是一个包含282240 uint32类型元素的ndarray

在'calculation_function（）'中，我使用for循环来计算每12位结果并将其放入'output_data'

因为这非常慢，我将input_data拆分为例如4或8 零件并计算calculate_function（）中的每个零件。

现在我正在寻找一种方法，如何平行4或8功能呼叫

数据的顺序是基本的，因为数据在图像中每个像素必须处于正确的位置。所以函数调用没有。 1 计算第一个和最后一个函数调用的最后一个像素图像。

计算工作正常，图像可以完全重建从我的算法，但我需要并行化来加快时间关键方面。

要点： 一个输入ndarray分为4或8个部分。在每个部分中都有70560或35280 uint32值。从每12位我计算一个Pixel有4或8个函数调用。每个函数返回一个带有188160或94080像素的ndarray。所有返回值将连续放在一起并重新整形为图像。

已经完成的工作： 计算已经完成，我可以重建我的图像

问题： 函数调用是连续完成的，但每次图像重建都很慢

主要目标： 通过并行化函数调用来加速函数调用。

代码：

def decompress(payload,WIDTH,HEIGHT):
    # INPUTS / OUTPUTS
    n_threads = 4                                                                           
    img_input = np.fromstring(payload, dtype='uint32')                                      
    img_output = np.zeros((WIDTH * HEIGHT), dtype=np.uint32)                            
    n_elements_part = np.int(len(img_input) / n_threads)                                    
    input_part=np.zeros((n_threads,n_elements_part)).astype(np.uint32)                      
    output_part =np.zeros((n_threads,np.int(n_elements_part/3*8))).astype(np.uint32)        

    # DEFINE PARTS (here 4 different ones)
    start = np.zeros(n_threads).astype(np.int)                          
    end = np.zeros(n_threads).astype(np.int)                            
    for i in range(0,n_threads):
        start[i] = i * n_elements_part
        end[i] = (i+1) * n_elements_part -1

    # COPY IMAGE DATA
    for idx in range(0,n_threads):
        input_part [idx,:] = img_input[start[idx]:end[idx]+1]


    for idx in range(0,n_threads):                          # following line is the function_call that should be parallized
        output_part[idx,:] = decompress_part2(input_part[idx],output_part[idx])



    # COPY PARTS INTO THE IMAGE
    img_output[0     : 188160] = output_part[0,:]
    img_output[188160: 376320] = output_part[1,:]
    img_output[376320: 564480] = output_part[2,:]
    img_output[564480: 752640] = output_part[3,:]

    # RESHAPE IMAGE
    img_output = np.reshape(img_output,(HEIGHT, WIDTH))

    return img_output

请不要照顾我的初学者编程风格:) 只是寻找一个解决方案如何使用多处理模块并行化函数调用并返回返回的ndarrays。

非常感谢你的帮助！

Answer 1

您可以使用多处理模块中的进程池

        def test(a):
           return a

        from multiprocessing.dummy import Pool
        p = Pool(3)
        a=p.starmap(test, zip([1,2,3]))
        print(a)
        p.close()
        p.join()

Answer 2

kar的答案有效，但请记住，他正在使用可能受GIL限制的.dummy模块。有关它的更多信息： multiprocessing.dummy in Python is not utilising 100% cpu

如何从Python中的线程获取返回值？

----------------------------------------------- --------------------------------

2 个答案: