使用multiprocessing.pool动态更新嵌套字典(速度问题)

时间:2018-02-27 05:46:51

标签: python-3.x dictionary multiprocessing pool

我编写了一个简单的代码,以了解在使用multiprocessing.Pool时,子进程之间缺乏通信会导致随机结果。我将嵌套字典输入为multiprocessing.Manager生成的dictproxy对象:

manager = Manager()
my_dict = manager.dict()
my_dict['nested'] = nested

嵌入16个打开进程的池中。嵌套字典定义如下。函数my_function只生成存储在嵌套字典元素中的每个数字的第二个幂。

正如预期的那样,由于多线程中的共享内存,当我使用multiprocessing.dummy

时,我得到了正确的结果
{0: 1, 1: 4, 2: 9, 3: 16}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 9, 1: 16, 2: 25, 3: 36}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}

但是当我使用multiprocessing时,结果不正确并且在每次运行中完全随机。不正确结果的一个例子是:

{0: 1, 1: 2, 2: 3, 3: 4}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 3, 1: 4, 2: 5, 3: 6}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}

在此特定投放中,'data' 1和3中的'element'未更新。我知道这是由于子进程之间缺乏通信而导致的,这些进程禁止每个子进程中的“更新的”嵌套字典被正确地发送给其他子进程。但是,有人可以帮助我使用Manager.Queue来组织这种子系统间的通信,并以最小的运行时间获得正确的结果吗?

代码(Python 3.5)

from multiprocessing import Pool, Manager
import numpy as np

def my_function(A):
    arg1 = A[0]
    my_dict = A[1]

    temporary_dict = my_dict['nested']
    for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])):
        temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2 

    my_dict['nested'] = temporary_dict


if __name__ == '__main__':


    # nested dictionary definition
    strs1 = {}
    strs2 = {}
    strs3 = {}
    strs4 = {}
    strs5 = {}
    strs1['data'] = {}
    strs2['data'] = {}
    strs3['data'] = {}
    strs4['data'] = {}
    strs5['data'] = {}

    for i in [0,1,2,3]:
        strs1['data'][i] = i + 1
        strs2['data'][i] = i + 2
        strs3['data'][i] = i + 3
        strs4['data'][i] = i + 4
        strs5['data'][i] = i + 5

    nested = {}
    nested['elements'] = [strs1, strs2, strs3, strs4, strs5]
    nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5']

    # parallel processing
    pool = Pool(processes = 16)
    manager = Manager()
    my_dict = manager.dict()
    my_dict['nested'] = nested

    sequence = np.arange(len(my_dict['nested']['elements']))
    pool.map(my_function, ([seq,my_dict] for seq in sequence))

    pool.close()
    pool.join()

    # printing the data in all elements of the nested dictionary
    print(my_dict['nested']['elements'][0]['data'])
    print(my_dict['nested']['elements'][1]['data'])
    print(my_dict['nested']['elements'][2]['data'])
    print(my_dict['nested']['elements'][3]['data'])
    print(my_dict['nested']['elements'][4]['data'])

解决此问题并获得正确结果的一种方法是使用multiprocessing.Lock,但这会降低速度:

from multiprocessing import Pool, Manager, Lock
import numpy as np

def init(l):
    global lock
    lock = l

def my_function(A):
    arg1 = A[0]
    my_dict = A[1]

    with lock:
        temporary_dict = my_dict['nested']
        for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])):
            temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2 

        my_dict['nested'] = temporary_dict


if __name__ == '__main__':


    # nested dictionary definition
    strs1 = {}
    strs2 = {}
    strs3 = {}
    strs4 = {}
    strs5 = {}
    strs1['data'] = {}
    strs2['data'] = {}
    strs3['data'] = {}
    strs4['data'] = {}
    strs5['data'] = {}

    for i in [0,1,2,3]:
        strs1['data'][i] = i + 1
        strs2['data'][i] = i + 2
        strs3['data'][i] = i + 3
        strs4['data'][i] = i + 4
        strs5['data'][i] = i + 5

    nested = {}
    nested['elements'] = [strs1, strs2, strs3, strs4, strs5]
    nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5']

    # parallel processing

    manager = Manager()
    l =  Lock()
    my_dict = manager.dict()
    my_dict['nested'] = nested
    pool = Pool(processes = 16, initializer=init, initargs=(l,))

    sequence = np.arange(len(my_dict['nested']['elements']))
    pool.map(my_function, ([seq,my_dict] for seq in sequence))

    pool.close()
    pool.join()

    # printing the data in all elements of the nested dictionary
    print(my_dict['nested']['elements'][0]['data'])
    print(my_dict['nested']['elements'][1]['data'])
    print(my_dict['nested']['elements'][2]['data'])
    print(my_dict['nested']['elements'][3]['data'])
    print(my_dict['nested']['elements'][4]['data'])

0 个答案:

没有答案