Question

我有一个代码，我需要在其中读取excel文件并将信息存储到词典中。

我必须使用multiprocessing.Manager()来创建字典，以便能够从我使用multiprocess.Process运行的函数中检索计算输出。

问题是，当multiprocessing.Manager()和manager.dict()用于创建字典时，它比仅使用dict()需要大约400倍（并且dict()不是共享内存结构）。

以下是验证差异的示例代码：

import xlrd
import multiprocessing
import time

def DictManager(inp1, inp2):
    manager = multiprocessing.Manager()
    Dict = manager.dict()
    Dict['input1'] = inp1
    Dict['input2'] = inp2
    Dict['Output1'] = None
    Dict['Output2'] = None
    return Dict

def DictNoManager(inp1, inp2):
    Dict = dict()
    Dict['input1'] = inp1
    Dict['input2'] = inp2
    Dict['Output1'] = None
    Dict['Output2'] = None
    return Dict

def ReadFileManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'

def ReadFileNoManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictNoManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'


if __name__ == '__main__':
    excelfile = 'MyFile.xlsx'

    start = time.time()
    ReadFileNoManager(excelfile)
    end = time.time()
    print 'Run time NoManager:', end - start, 's'

    start = time.time()
    ReadFileManager(excelfile)
    end = time.time()
    print 'Run time Manager:', end - start, 's'

有没有办法改善multiprocessing.Manager()的性能？

如果答案为否，是否有任何其他共享内存结构可用于替换我正在做的事情并提高性能？

感谢您的帮助！

编辑：

我的主要功能使用以下代码：

def MyFunction(Dictionary, otherdata):
    #Perform calculation and save results in the dictionary
    Dict['Output1'] = Value1
    Dict['Output2'] = Value2

ListOfProcesses = []
for Dict in DictList:
    p = multiprocessing.Process(target=MyFunction, args=(Dict, otherdata)
    p.start()
    ListOfProcesses.append(p)  
for p in ListOfProcesses:
    p.join()

如果我不使用管理器，我将无法检索输出。

Answer 1

正如我在评论中提到的，我建议使用主进程来读取excel文件。然后使用多处理进行函数调用。只需将您的函数添加到apply_function，并确保它返回您想要的任何内容。 results将包含结果列表。

更新：我将地图更改为星图以包含您的额外参数

def ReadFileNoManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictNoManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'
    return DictList

def apply_function(your_dict, otherdata):
    pass

if __name__ == '__main__':
    excelfile = 'MyFile.xlsx'
    dict_list = ReadFileNoManager(excelfile)    
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    results = pool.starmap(apply_function, zip(dict_list, repeat(otherdata)))

使用multiprocessing.Manager（）创建共享内存字典太慢

1 个答案: