我正在尝试创建一个解析文件并将其转换为大型列表的脚本,该列表应该在之后并行处理。我已经尝试了一些python的多处理实现,但它们似乎都按顺序运行。
def grouper(n, iterable, padvalue=None):
"""grouper(3, 'abcdefg', 'x') -->
('a','b','c'), ('d','e','f'), ('g','x','x')"""
return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def createRecords(givenchunk):
for i1 in range(len(givenchunk)):
<create somedata>
records.append(somedata)
if __name__=='__main__':
manager = Manager()
parsedcdrs = manager.list([])
records = manager.list([])
<some general processing here which creates a shared list "parsedcdrs". Uses map to create a process "p" in some def which is terminated afterwards.>
# Get available cpus
cores = multiprocessing.cpu_count()
# First implementation with map with map.
t = multiprocessing.Pool(cores)
print "Map processing with chunks containing 5000"
t.map(createRecords, zip(parsedcdr), 5000)
# Second implementation with async.
t = multiprocessing.Pool(cores)
for chunk in grouper(5000, parsedcdr):
print "Async processing with chunks containing 5000"
t.apply_async(createRecords, args=(chunk,), callback=log_result)
t.close()
t.join()
# Third implementation with Process.
jobs = []
for chunk in grouper(5000, parsedcdr):
t = multiprocessing.Process(target=createRecords, args=(chunk,))
t.start()
jobs.append(t)
print "Process processing with chunks containing 5000"
for j in jobs:
j.join()
for j in jobs:
j.join()
有人能指出我正确的方向吗?
答案 0 :(得分:0)
多处理似乎在上面的示例中正常工作。问题出在另一个def中,导致性能下降。