我花了一些时间研究Python multiprocessing
模块,使用os.fork
,并使用Array
中的multiprocessing
共享内存以获得一段代码I'写作。
项目本身归结为:我有几个MxN阵列(假设我有3个阵列,分别叫做A,B和C),我需要处理它来计算一个新的MxN阵列(称为D),其中: / p>
Dij = f(Aij,Bij,Cij)
函数f
是不能应用标准向量运算的。我认为这项任务被称为“令人尴尬的并行”。考虑到multiprocessing
中涉及的开销,我将把D的计算分解为块。例如,如果D是8x8并且我有4个进程,则每个处理器将负责解决D的4x4“块”。
现在,数组的大小有可能非常大(大约几GB),所以我希望所有数组使用共享内存(甚至数组D,它将具有写入它的子流程)。我相信我使用所呈现的here的修改版本来解决共享阵列问题。
但是,从实现的角度来看,将数组A,B和C放入字典会很不错。我不清楚的是,如果在每个子流程中增加字典的引用计数器,这样做会导致数组被复制到内存中。
为了尝试回答这个问题,我编写了一个小测试脚本(见下文)并尝试使用valgrind --tool=massif
运行它来跟踪内存使用情况。但是,我不太清楚如何控制它的结果。具体来说,每个massiff.out文件(其中文件的数量等于我的测试脚本创建的子进程的数量+ 1)是否表示该进程使用的内存(即我需要将它们全部加起来以获得总内存使用量)或者我只需要考虑与父进程关联的massif.out。
旁注:我的一个共享内存数组有子进程写入它。我知道这个声音可以避免,特别是因为我不使用锁来限制在任何给定时间只有一个子进程写入数组。这是一个问题吗?我的想法是,由于数组填写的顺序是无关紧要的,任何索引的计算都独立于任何其他索引,并且任何给定的子进程永远不会写入与任何其他进程相同的数组索引,不是任何竞争条件。这是对的吗?
#! /usr/bin/env python
import multiprocessing as mp
import ctypes
import numpy as np
import time
import sys
import timeit
def shared_array(shape=None, lock=False):
"""
Form a shared memory numpy array.
https://stackoverflow.com/questions/5549190/is-shared-readonly-data-copied-to-different-processes-for-python-multiprocessing
"""
shared_array_base = mp.Array(ctypes.c_double, shape[0]*shape[1], lock=lock)
# Create a locked or unlocked array
if lock:
shared_array = np.frombuffer(shared_array_base.get_obj())
else:
shared_array = np.frombuffer(shared_array_base)
shared_array = shared_array.reshape(*shape)
return shared_array
def worker(indices=None, queue=None, data=None):
# Loop over each indice and "crush" some data
for i in indices:
time.sleep(0.01)
if data is not None:
data['sink'][i, :] = data['source'][i, :] + i
# Place ID for completed indice into the queue
queue.put(i)
if __name__ == '__main__':
# Set the start time
begin = timeit.default_timer()
# Size of arrays (m x n)
m = 1000
n = 1000
# Number of Processors
N = 2
# Create a queue to use for tracking progress
queue = mp.Queue()
# Create dictionary and shared arrays
data = dict()
# Form a shared array without a lock.
data['source'] = shared_array(shape=(m, n), lock=True)
data['sink'] = shared_array(shape=(m, n), lock=False)
# Create a list of the indices associated with the m direction
indices = range(0, m)
# Parse the indices list into range blocks; each process will get a block
indices_blocks = [int(i) for i in np.linspace(0, 1000, N+1)]
# Initialize a list for storing created sub-processes
procs = []
# Print initialization time-stap
print 'Time to initialize time: {}'.format(timeit.default_timer() - begin)
# Create and start each sbu-process
for i in range(1, N+1):
# Start of the block
start = indices_blocks[i-1]
# End of the block
end = indices_blocks[i]
# Create the sub-process
procs.append(mp.Process(target=worker,
args=(indices[start:end], queue, data)))
# Kill the sub-process if/when the parent is killed
procs[-1].daemon=True
# Start the sub-process
procs[-1].start()
# Initialize a list to store the indices that have been processed
completed = []
# Entry a loop dependent on whether any of the sub-processes are still alive
while any(i.is_alive() for i in procs):
# Read the queue, append completed indices, and print the progress
while not queue.empty():
done = queue.get()
if done not in completed:
completed.append(done)
message = "\rCompleted {:.2%}".format(float(len(completed))/len(indices))
sys.stdout.write(message)
sys.stdout.flush()
print ''
# Join all the sub-processes
for p in procs:
p.join()
# Print the run time and the modified sink array
print 'Running time: {}'.format(timeit.default_timer() - begin)
print data['sink']
编辑:我似乎遇到了另一个问题;具体来说,n等于3百万的值将导致内核终止进程(我认为这是由于内存问题)。这似乎与shared_array()的工作方式有关(我可以创建相同大小的np.zeros数组,但没有问题)。玩了一下后,我得到了如下所示的追溯。我不完全确定导致内存分配错误的原因,但是快速谷歌搜索会讨论mmap如何映射虚拟地址空间,我猜测它是否小于机器的物理内存量?
Traceback (most recent call last):
File "./shared_array.py", line 66, in <module>
data['source'] = shared_array(shape=(m, n), lock=True)
File "./shared_array.py", line 17, in shared_array
shared_array_base = mp.Array(ctypes.c_double, shape[0]*shape[1], lock=lock)
File "/usr/apps/python/lib/python2.7/multiprocessing/__init__.py", line 260, in Array
return Array(typecode_or_type, size_or_initializer, **kwds)
File "/usr/apps/python/lib/python2.7/multiprocessing/sharedctypes.py", line 120, in Array
obj = RawArray(typecode_or_type, size_or_initializer)
File "/usr/apps/python/lib/python2.7/multiprocessing/sharedctypes.py", line 88, in RawArray
obj = _new_value(type_)
File "/usr/apps/python/lib/python2.7/multiprocessing/sharedctypes.py", line 68, in _new_value
wrapper = heap.BufferWrapper(size)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 243, in __init__
block = BufferWrapper._heap.malloc(size)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 223, in malloc
(arena, start, stop) = self._malloc(size)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 120, in _malloc
arena = Arena(length)
File "/usr/apps/python/lib/python2.7/multiprocessing/heap.py", line 82, in __init__
self.buffer = mmap.mmap(-1, size)
mmap.error: [Errno 12] Cannot allocate memory