我真的很感谢在Julia中并行化以下伪代码的一些帮助(对于冗长的帖子,我会提前道歉):
P, Q # both K by N matrix, K = num features and N = num samples
X, Y # K*4 by N and K*2 by N matrices
tempX, tempY # column vectors of size K*4 and K*2
ndata # a dict from parsing a .m file to be used by a solver with JuMP and Ipopt
# serial version
for i = 1:N
ndata[P] = P[:, i] # technically requires a for loop from 1 to K since the dict has to be indexed element-wise
ndata[Q] = Q[:, i]
ndata_A = run_solver_A(ndata) # with a third-party package and JuMP, Ipopt
ndata_B = run_solver_B(ndata)
kX = 1, kY = 1
for j = 1:K
tempX[kX:kX+3] = [ndata_A[j][a], ndata_A[j][b], P[j, i], Q[j, i]]
tempY[kY:kY+1] = [ndata_B[j][a], ndata_B[j][b]]
kX += 4
kY += 2
end
X[:, i] = deepcopy(tempX)
Y[:, i] = deepcopy(tempY)
end
很明显,只要没有两次访问for
和P
的列并且{{1}的同一列Q
,就可以独立执行此i
循环}}和P
一次被访问。我唯一需要注意的是Q
和i
的列X
是正确的Y
和tempX
对,而我不是尽可能在乎i = 1,...,N阶是否被维持(希望这是有道理的!)。
我阅读了官方文档和一些在线教程,并用tempY
和@spawn
编写了以下内容,它们通过将占位符编号1.0替换了fetch
等来用于插入部分和180:
ndata[j][a]
上面的代码很好,但我确实注意到输出始终是工作程序2-> 3-> 4-> 5-> 2 ...,并且比串行情况要慢得多(我正在笔记本电脑上对此进行测试只有4个核心,但最终我将在集群上运行它)。将它添加到using Distributed
addprocs(2)
num_proc = nprocs()
@everywhere function insertPQ(P, Q)
println(myid())
data = zeros(4*length(P))
k = 1
for i = 1:length(P)
data[k:k+3] = [1.0, 180., P[i], Q[i]]
k += 4
end
return data
end
P = [0.99, 0.99, 0.99, 0.99]
Q = [-0.01, -0.01, -0.01, -0.01]
for i = 1:5 # should be 4 x 32
global P = hcat(P, (P .- 0.01))
global Q = hcat(Q, (Q .- 0.01))
end
datas = zeros(16, 0) # serial result
datap = zeros(16, 32) # parallel result
@time for i = 1:32
s = fetch(@spawn insertPQ(P[:, i], Q[:, i]))
global datap = hcat(datap, s)
end
@time for i = 1:32
k = 1
for j = 1:4
datas[k:k+3, i] = [1.0, 180., P[j, i], Q[j, i]]
k += 4
end
end
println(datap == datas)
的{{1}}中后,我不得不停止运行。
对于run_solver_A/B
,我不知道如何将整个向量传递给函数。我可能会误解了文档,但是“通过使用可用的工作程序和任务将f应用于每个元素来变换集合c”听起来像我只能在元素上做到这一点?那不可能。上周我去了Julia的入门会议,并问讲师这件事。他说我应该使用insertPQ()
,从那以后我一直在努力使其发挥作用。
那么,如何并行化我的原始伪代码?任何帮助或建议,我们将不胜感激!