我正在进行并发编程,我注意到以下内容:
版本1 multiprocessing.Process():
# huge_df is a giant pandas.DataFrame(), i.e. 1 billion rows, 50 cols
mgr_dict = multiprocessing.Manager().dict()
jobs = [ multiprocessing.Process(target=worker, args=(huge_df, mgr_dict, i) for in in xrange(2000) ]
for j in jobs:
j.start()
for j in jobs:
j.join()
# blows out all my cores, grinds server to a halt
版本2 multiprocessing.Pool():
# huge_df is a giant pandas.DataFrame(), i.e. 1 billion rows, 50 cols
mgr_dict = multiprocessing.Manager().dict()
args_iterable = [ (huge_df, mgr_dict, i) for i in xrange(2000) ]
pool = multiprocessing.Pool()
pool.map(worker, args_iterable) # worker in this example deconstructs tuple into separate vars
pool.close()
pool.join()
# takes a very long time to begin using fully allotted CPU resources
因此,我想知道: