我试图使用Python 2.7的多处理模块来提高循环numpy ndarrays的速度。我使用已经创建的矩阵' C'使用6018行和27721列来计算PMI矩阵。但是,我得到了一个" [Errno 12]无法分配内存"运行以下代码时。我假设错误与变量PMI有关,因为如果我将它移动到pmiCreation函数(使PMI成为局部变量,但我希望它自然是一个全局变量),那么错误陈述消失,但它是没用的,因为我需要程序记住对PMI变量的更新。任何想法如何解决这个问题?
print "Creating mutual information matrix"
PMI = np.zeros((C.shape))
def pmiCreation(indexStart):
N = C.sum()
invN = 1/N # replaced divide by N with multiply by invN in formula below
row, col = C.shape
print "Creating mutual information matrix using indexStart:",indexStart
for r in range(row)[indexStart:indexStart+346]: # u
for c in range(r): # w
if C[r,c]!=0: # if they co-occur
num = C[r,c]*invN # getting number of reviews where u and w co-occur and multiply by invN (numerator)
denom = (sum(C[:,c])*invN) * (sum(C[r])*invN)
pmi = log10(num*(1/denom))
PMI[r,c] = pmi
PMI[c,r] = pmi
pool = Pool(processes=8) # process per core
index_inputs = [0,346,692,1038,1384, 1730,2166, 2512,2858]
pool.map(pmiCreation, index_inputs)