我有一个代码,我需要在其中读取excel文件并将信息存储到词典中。
我必须使用multiprocessing.Manager()
来创建字典,以便能够从我使用multiprocess.Process
运行的函数中检索计算输出。
问题是,当multiprocessing.Manager()
和manager.dict()
用于创建字典时,它比仅使用dict()
需要大约400倍(并且dict()
不是共享内存结构)。
以下是验证差异的示例代码:
import xlrd
import multiprocessing
import time
def DictManager(inp1, inp2):
manager = multiprocessing.Manager()
Dict = manager.dict()
Dict['input1'] = inp1
Dict['input2'] = inp2
Dict['Output1'] = None
Dict['Output2'] = None
return Dict
def DictNoManager(inp1, inp2):
Dict = dict()
Dict['input1'] = inp1
Dict['input2'] = inp2
Dict['Output1'] = None
Dict['Output2'] = None
return Dict
def ReadFileManager(excelfile):
DictList = []
book = xlrd.open_workbook(excelfile)
sheet = book.sheet_by_index(0)
line = 2
for line in range(2,sheet.nrows):
inp1 = sheet.cell(line,2).value
inp2 = sheet.cell(line,3).value
dictionary = DictManager(inp1, inp2)
DictList.append(dictionary)
print 'Done!'
def ReadFileNoManager(excelfile):
DictList = []
book = xlrd.open_workbook(excelfile)
sheet = book.sheet_by_index(0)
line = 2
for line in range(2,sheet.nrows):
inp1 = sheet.cell(line,2).value
inp2 = sheet.cell(line,3).value
dictionary = DictNoManager(inp1, inp2)
DictList.append(dictionary)
print 'Done!'
if __name__ == '__main__':
excelfile = 'MyFile.xlsx'
start = time.time()
ReadFileNoManager(excelfile)
end = time.time()
print 'Run time NoManager:', end - start, 's'
start = time.time()
ReadFileManager(excelfile)
end = time.time()
print 'Run time Manager:', end - start, 's'
有没有办法改善multiprocessing.Manager()
的性能?
如果答案为否,是否有任何其他共享内存结构可用于替换我正在做的事情并提高性能?
感谢您的帮助!
编辑:
我的主要功能使用以下代码:
def MyFunction(Dictionary, otherdata):
#Perform calculation and save results in the dictionary
Dict['Output1'] = Value1
Dict['Output2'] = Value2
ListOfProcesses = []
for Dict in DictList:
p = multiprocessing.Process(target=MyFunction, args=(Dict, otherdata)
p.start()
ListOfProcesses.append(p)
for p in ListOfProcesses:
p.join()
如果我不使用管理器,我将无法检索输出。
答案 0 :(得分:1)
正如我在评论中提到的,我建议使用主进程来读取excel文件。然后使用多处理进行函数调用。只需将您的函数添加到apply_function
,并确保它返回您想要的任何内容。 results
将包含结果列表。
更新:我将地图更改为星图以包含您的额外参数
def ReadFileNoManager(excelfile):
DictList = []
book = xlrd.open_workbook(excelfile)
sheet = book.sheet_by_index(0)
line = 2
for line in range(2,sheet.nrows):
inp1 = sheet.cell(line,2).value
inp2 = sheet.cell(line,3).value
dictionary = DictNoManager(inp1, inp2)
DictList.append(dictionary)
print 'Done!'
return DictList
def apply_function(your_dict, otherdata):
pass
if __name__ == '__main__':
excelfile = 'MyFile.xlsx'
dict_list = ReadFileNoManager(excelfile)
pool = multiprocessing.Pool(multiprocessing.cpu_count())
results = pool.starmap(apply_function, zip(dict_list, repeat(otherdata)))