使用numpy进行多处理会使Python在OSX上意外退出

时间:2013-10-31 11:23:53

标签: python macos numpy multiprocessing

当遇到numpy运行多处理时,我遇到了一个问题,即Python意外退出。我已经解决了这个问题,现在我可以确认在运行下面的代码时多处理工作是完美的:

import numpy as np
from multiprocessing import Pool, Process
import time
import cPickle as p

def test(args):
    x,i = args
    if i == 2:
        time.sleep(4)
    arr = np.dot(x.T,x)
    print i

if __name__ == '__main__':
    x = np.random.random(size=((2000,500)))
    evaluations = [(x,i) for i in range(5)]
    p = Pool()
    p.map_async(test,evaluations)
    p.close()
    p.join()

当我尝试评估下面的代码时会出现问题。这使得Python意外退出:

import numpy as np
from multiprocessing import Pool, Process
import time
import cPickle as p

def test(args):
    x,i = args
    if i == 2:
        time.sleep(4)
    arr = np.dot(x.T,x)
    print i

if __name__ == '__main__':
    x = np.random.random(size=((2000,500)))
    test((x,4)) # Added code
    evaluations = [(x,i) for i in range(5)]
    p = Pool()
    p.map_async(test,evaluations)
    p.close()
    p.join()

请帮助别人。我对所有建议持开放态度。谢谢。注意:我尝试过两台不同的机器,同样的问题也出现了。

2 个答案:

答案 0 :(得分:6)

这是MacOS X上多处理和numpy的已知问题,并且有点重复:

segfault using numpy's lapack_lite with multiprocessing on osx, not linux

http://mail.scipy.org/pipermail/numpy-discussion/2012-August/063589.html

答案似乎是在连接Numpy时使​​用除Apple加速框架之外的其他BLAS ...不幸的是:(

答案 1 :(得分:5)

我找到了问题的解决方法。在初始化多处理实例之前,将Numpy与BLAS一起使用时会出现问题。我的解决方法是简单地将Numpy代码(运行BLAS)放入一个进程,然后运行多处理实例。这不是一个好的编码风格,但它的工作原理。见下面的例子:

以下将失败 - Python将退出:

import numpy as np
from multiprocessing import Pool, Process

def test(x):
    arr = np.dot(x.T,x) # On large matrices, this calc will use BLAS.

if __name__ == '__main__':
    x = np.random.random(size=((2000,500))) # Random matrix
    test(x)
    evaluations = [x for _ in range(5)]
    p = Pool()
    p.map_async(test,evaluations) # This is where Python will quit, because of the prior use of BLAS.
    p.close()
    p.join()

以下将成功:

import numpy as np
from multiprocessing import Pool, Process

def test(x):
    arr = np.dot(x.T,x) # On large matrices, this calc will use BLAS.

if __name__ == '__main__':
    x = np.random.random(size=((2000,500))) # Random matrix
    p = Process(target = test,args = (x,))
    p.start()
    p.join()
    evaluations = [x for _ in range(5)]
    p = Pool()
    p.map_async(test,evaluations)
    p.close()
    p.join()