Question

我有以下伪代码（循环），我试图通过使用Matlab并行计算工具箱或Matlab分布式服务器计算来实现它（可变步长实现）。实际上，我有一个这个循环的matlab代码，可以在普通的matlab 2013a中运行。

给定：u0，t_0，T（初始和结束时间值），初始步长：h0

while t_0 < T  

% the fist step is to compute U1, U2 which depend on t_0 and some known parameters

U1(t_0, h0, u0, parameters)   

U2(t_0, h0, u0, parameters)   

% so U1 and U2 are independent, which can be computed in parallel using Matlab

% the next step is to compute U3, U4, U5, U6 which depends on t_0, U1, U2, and known parameters

U3(t_0, h0, u0, U1, U2, parameters) 

U4(t_0, h0, u0, U1, U2, parameters)  

U5(t_0, h0, u0, U1, U2, parameters)  

U6(t_0, h0, u0, U1, U2, parameters)

% so U3, U4, U5, U6 are independent, which can be also computed in parallel using Matlab

%finally, compute U7 and U8 which depend on U1,U2,..,U6

U7(t0, u0,h0, U1,U2,U3,U4,U5,U6)

U8(t0, u0,h0,U1,U2,U3,U4,U5,U6)

% so U7 and U8 are also independent, and we can compute them in parallel as well.

％此处执行步长控制，然后指定h0：= h_new
t0 = t0 + h_new

end

您能否建议我使用Matlab并行实现上述代码的最佳方法？通过最好的方式，我的意思是我希望尽可能快地获得整个计算的加速。（我可以访问超级计算机LEO III，它有162个计算机节点（共有1944个核心）。所以每个节点有12个核心。）

我的想法是同时在两个独立的工作人员（核心）上计算U1，U2，这些工作人员拥有自己的记忆。使用获得的U1，U2的结果，可以用类似的方式计算U3，U4，U5，U6，最后用于U7，U8。为此，我想我需要在Matlabpool中使用PARFOR？但我不知道循环需要多少索引（对应于核心/处理器的数量）。

我的问题是：

我可以使用上面提到的超级计算机，所以我可以使用Matlab分布式计算服务器吗？
对于此代码，我应该使用Parallel Computing Toolbox还是Matlab Distributed Computing服务器？我的意思是使用Parallel Computing Toolbox（本地工作者），我无法指定哪些工作人员将计算U1和U2（也用于U3，U4，......），因为它们共享内存并以交互方式运行，是不是？
如果我会使用提出的想法，那么我需要多少工人？大概有8个核心？最好使用1个计算节点并要求9个核心（8个用于使用，1个用于matlab会话）或使用8个计算机节点？

我是Matlab Parallel Computing的初学者。请给出你的建议！谢谢！

Answer 1

我建议并行化while循环，因为你想在节点之间分配许多迭代。 Parfor是开始使用并行计算的最简单方法，并且可以很好地解决您的直接问题。如果有很多时间步骤需要花费很多时间，那么只能使用服务器，因为任何并行化都会带来一定的开销。

本地计算允许您在最新版本的Matlab中使用12个核心;确保你有足够的RAM来保存循环体的13个副本在内存中。凭借良好的处理器架构以及没有其他程序可以竞争资源，可以在所有内核上运行。

因此：

timeSteps = t0:h:T;

parfor timeIdx = 1:length(timeSteps)
    t0 = timeSteps(timeIdx);

    %# calculate all your u's here

    %# collect the output
    result{timeIdx,1} = U7;
    result{timeIdx,2} = U8;

end

Answer 2

我想说U1，... U8的所有计算都需要调用一个函数来计算矩阵向量乘法。假设我们不关心他们现在需要多长时间（在我的情况下并不多）。问题是，对于以前的方法，U1，..，U8不是独立的（它们是相关的！）。这意味着计算你需要U_ {i}的U_ {i + 1}。所以你需要一个接一个地依次计算它们。现在我可以构建一个允许同时计算U1，U2（独立）的方法，U3，...，U6和U7，U8也是如此。所以我想保存整个计算的cpu时间。这就是为什么我认为可以使用matlab并行计算。

我应该使用Parallel Computing Toolbox还是Matlab Distributed Computing服务器？

2 个答案: