我是并行编程的新手。我的任务是分析数百个数据文件。这些数据中的每一个都接近300MB,并且可以切成多个切片。我的电脑是一台4芯电脑。我希望尽快得到每个数据的结果 每个数据文件的分析包括2个程序。首先,将数据读入内存,然后将其切片为切片,这是密集型工作。然后,对这个文件的切片进行大量计算,这是cpu密集的 所以我的策略是将这些文件分组为4组。对于这些文件的每一组,首先,将4个文件的所有数据读入内存,4个进程中有4个进程。代码就像,
with Pool(processes=4) as pool:
data_list = pool.map(read_and_slice, files) # len(files)==4
然后对于data
中的每个data_list
,使用4个进程进行计算。
for data in data_list: # I want to get the result of each data asap
with Pool(processes=4) as pool:
result_list = pool.map(compute, data.slices) # anaylyze each slice of data
analyze(result_list) # analyze the results of previous procedure, for example, get the average.
然后去另一组。
所以问题是在计算数百个文件的整个过程中,池被重建多次。我怎样才能避免重新创建池和进程的开销?我的代码中是否有大量内存开销?有没有更好的方法让我尽可能少地花时间?
谢谢!
答案 0 :(得分:1)
一种选择是将with Pool
语句移到for
循环之外......
p = Pool()
for data in data_list:
result_list = pool.map(compute, data.slices)
analyze(result_list)
p.join()
p.close()
这适用于python 2或3。
如果您安装(我的模块)pathos
,然后执行from pathos.pools import ProcessPool as Pool
,并完全按照您的原则保留其余代码 - 那么您只需创建一个Pool
。这是因为pathos
缓存了Pool
,并且当创建具有相同配置的新Pool
实例时,它只是重用现有实例。您可以pool.terminate()
关闭它。
>>> from pathos.pools import ProcessPool as Pool
>>> pool = Pool()
>>> data_list = [range(4), range(4,8), range(8,12), range(12,16)]
>>> squared = lambda x:x**2
>>> mean = lambda x: sum(x)/len(x)
>>> for data in data_list:
... result = pool.map(squared, data)
... print mean(result)
...
3
31
91
183
实际上,pathos
可让您执行嵌套池,因此您还可以将for
循环转换为异步映射(amap
中的pathos
)...并且因为内部地图不需要保留顺序,您可以使用无序地图迭代器(imap_unordered
中的multiprocessing
或uimap
中的pathos
)。例如,请看这里:
https://stackoverflow.com/questions/28203774/how-to-do-hierarchical-parallelism-in-ipython-parallel并在此处:
https://stackoverflow.com/a/31617653/2379433
pathos
只有python2
才有python3
。但很快(待发布)将完全转换为 01-01 17:05:41.406: A/libc(3486): Fatal signal 11 (SIGSEGV), code 1, fault addr 0xa001cc10 in tid 3523 (RenderThread)
01-01 16:58:16.067: I/DEBUG(75): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-01 16:58:16.067: I/DEBUG(75): Build fingerprint: 'generic/vbox86tp/vbox86tp:5.0/LRX21M/buildbot12151835:userdebug/test-keys'
01-01 16:58:16.067: I/DEBUG(75): Revision: '0'
01-01 16:58:16.067: I/DEBUG(75): ABI: 'x86'
01-01 16:58:16.067: I/DEBUG(75): pid: 2726, tid: 2768, name: RenderThread >>> supernet.interactapp <<<
01-01 16:58:16.067: I/DEBUG(75): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa001cb50
01-01 16:58:16.093: I/DEBUG(75): eax a47ff980 ebx b6a9e4fc ecx 00000002 edx a2816a3c
01-01 16:58:16.093: I/DEBUG(75): esi a4811280 edi a001cb50
01-01 16:58:16.093: I/DEBUG(75): xcs 00000073 xds 0000007b xes 0000007b xfs 000000a7 xss 0000007b
01-01 16:58:16.093: I/DEBUG(75): eip b6a6cd2d ebp a48112a4 esp a2816a00 flags 00210287
01-01 16:58:16.093: I/DEBUG(75): backtrace:
01-01 16:58:16.094: I/DEBUG(75): #00 pc 0007cd2d /system/lib/libhwui.so
01-01 16:58:16.095: I/DEBUG(75): #01 pc 00025fa6 /system/lib/libhwui.so (android::uirenderer::Caches::clearGarbage()+38)
01-01 16:58:16.095: I/DEBUG(75): #02 pc 00041646 /system/lib/libhwui.so
01-01 16:58:16.095: I/DEBUG(75): #03 pc 0002b94d /system/lib/libhwui.so
01-01 16:58:16.095: I/DEBUG(75): #04 pc 00080d89 /system/lib/libhwui.so
01-01 16:58:16.095: I/DEBUG(75): #05 pc 000854f6 /system/lib/libhwui.so
01-01 16:58:16.095: I/DEBUG(75): #06 pc 0008a129 /system/lib/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+153)
01-01 16:58:16.096: I/DEBUG(75): #07 pc 000169de /system/lib/libutils.so (android::Thread::_threadLoop(void*)+398)
01-01 16:58:16.096: I/DEBUG(75): #08 pc 0006fe92 /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
01-01 16:58:16.096: I/DEBUG(75): #09 pc 000160fa /system/lib/libutils.so (thread_data_t::trampoline(thread_data_t const*)+122)
01-01 16:58:16.096: I/DEBUG(75): #10 pc 00020b78 /system/lib/libc.so (__pthread_start(void*)+56)
01-01 16:58:16.096: I/DEBUG(75): #11 pc 0001bfd9 /system/lib/libc.so (__start_thread+25)
01-01 16:58:16.097: I/DEBUG(75): #12 pc 00012bc6 /system/lib/libc.so (__bionic_clone+70)
。