for循环的多处理(numpy.ndarray)

时间:2015-01-26 18:06:41

标签: python python-2.7 memory-management multiprocessing

我试图使用Python 2.7的多处理模块来提高循环numpy ndarrays的速度。我使用已经创建的矩阵' C'使用6018行和27721列来计算PMI矩阵。但是,我得到了一个" [Errno 12]无法分配内存"运行以下代码时。我假设错误与变量PMI有关,因为如果我将它移动到pmiCreation函数(使PMI成为局部变量,但我希望它自然是一个全局变量),那么错误陈述消失,但它是没用的,因为我需要程序记住对PMI变量的更新。任何想法如何解决这个问题?

制作相互信息矩阵,PMI

print "Creating mutual information matrix"   
PMI = np.zeros((C.shape))     

def pmiCreation(indexStart):                         
    N = C.sum()
    invN = 1/N  # replaced divide by N with multiply by invN in formula below
    row, col = C.shape      
    print "Creating mutual information matrix using indexStart:",indexStart         

    for r in range(row)[indexStart:indexStart+346]:  # u
        for c in range(r):  # w
            if C[r,c]!=0:  # if they co-occur
                num = C[r,c]*invN  # getting number of reviews where u and w co-occur and multiply by invN (numerator)
                denom = (sum(C[:,c])*invN) * (sum(C[r])*invN)
                pmi = log10(num*(1/denom))           
                PMI[r,c] = pmi
                PMI[c,r] = pmi

pool = Pool(processes=8)       # process per core    
index_inputs = [0,346,692,1038,1384, 1730,2166, 2512,2858]    
pool.map(pmiCreation, index_inputs)  

0 个答案:

没有答案