我前一段时间编写了一个矩阵乘法器,试图让它更快我试图让它发生线程只是为了发现线程在同一个进程上运行..我后来发现了我在代码中实现的多处理库下面。现在,我不知道如何合并由于结果不在共享内存中而产生的进程所做的工作。
如何将分布式计算合并到" final_multi"变量?
继承我的代码:
#!/usr/bin/env python
import numpy as np
from multiprocessing import Process, Array
T=64
v1 = np.empty([T,T], dtype=np.float32)
v2 = np.empty_like(v1)
final_multi = np.empty_like(v1)
#shared = Array('f', final_multi) This doesnt work
def calclinea(mat1, mat2, fil, col):
escalar = 0
for vl in range(T):
escalar += mat1[fil,vl]*mat2[vl,col]
return escalar
def mulshared(vec1, vec2, froY, toY, froX, toX):
global final_multi
for y in range(froY,toY):
for x in range(froX, toX):
final_multi[x,y] = calclinea(vec1,vec2,x,y)
#shared[x,y] = calclinea(vec1,vec2,x,y)
def main():
for r in range(T): ### Allocate host memory
for c in range(T):
v1[r,c] = r
v2[r,c] = c+2
final_multi[r,c] = 0
#p1 =Process(target=mulshared, args=(v1,v1,0,(T*1/4 -1),0,T))
#p2 =Process(target=mulshared, args=(v1,v1,(T*1/4),(T*2/4 -1),0,T))
#p3 =Process(target=mulshared, args=(v1,v1,(T*2/4),(T*3/4 -1),0,T))
p4 =Process(target=mulshared, args=(v1,v1,T*3/4,T*4/4,0,T)) #All four processes to demo distribution of data, only 4th is initialized so result can be seen, p1 result is all zeros so..
p4.start()
p4.join()
print "\nfinal_multi\n", final_multi
main()
我知道这是一种低效的矩阵乘法方法,我只是想了解多处理是如何工作的,提前谢谢。
答案 0 :(得分:0)
您可以使用sharedmem模块,它是Python附带的多处理模块的增强版本。它提供了一种在进程之间共享内存的简便方法。
import sharedmem as shmem
out_matrix = shmem.empty((400,400))
def do_work(x):
out_matrix[100*x:100*(x+1), :] = x
def main():
with shmem.MapReduce(np=4) as pool:
pool.map(do_work, range(4))
在这个最小的例子中,输出矩阵将由四名工人并行填充。