Question

我正在构建一个大的ParallelTable，并且想要了解计算的进展情况。对于非并行表，以下代码可以很好地完成：

counter = 1;
Timing[
 Monitor[
  Table[
   counter++
  , {n, 10^6}];
 , ProgressIndicator[counter, {0, 10^6}]
 ]
]

结果为{0.943512, Null}。但是，对于并行情况，有必要在内核之间共享counter：

counter = 1;
SetSharedVariable[counter];
Timing[
 Monitor[
  ParallelTable[
   counter++
  , {n, 10^4}];
 , ProgressIndicator[counter, {0, 10^4}]
 ]
]

结果为{6.33388, Null}。由于counter的值需要在每次更新时在内核之间来回传递，因此性能损失将非常严重。有关如何了解计算方法的任何想法？也许让每个内核都有自己的counter值并每隔一段时间对它们求和？也许某种方法可以确定表格的哪些元素已经被归结为内核？

Answer 1

当你说“也许让每个内核都有自己的反值并且每隔一段时间对它们求和一次？”时，你几乎自己给出了答案。

尝试这样的事情：

counter = 1;
SetSharedVariable[counter];
ParallelEvaluate[last = AbsoluteTime[]; localcounter = 1;]
Timing[Monitor[
  ParallelTable[localcounter++; 
    If[AbsoluteTime[] - last > 1, last = AbsoluteTime[]; 
     counter += localcounter; localcounter = 0;], {n, 10^6}];, 
  ProgressIndicator[counter, {0, 10^6}]]]

请注意，它只需要比第一个单CPU案例更长的时间，因为它实际上在循环中做了一些事情。

您可以更改测试AbsoluteTime [] - last＆gt; 1比较像AbsoluteTime []的更频繁 - 最后＆gt; 0.1。

Answer 2

这似乎很难解决。来自manual：

除非您使用共享变量，否则执行并行评估是完全独立的，不能相互影响。此外，任何副作用，如变量的赋值，作为评估的一部分，将会丢失。一个唯一的影响并行评估是在最后返回结果。

但是，仍然可以使用旧的Print语句获取粗略的进度指示器：

enter image description here

Answer 3

另一种方法是对LinkWrite和LinkRead进行跟踪，并修改其跟踪消息以进行一些有用的记帐。

首先，启动一些并行内核：

LaunchKernels[]

这将设置并行内核的链接对象。

然后为链接读写计数器定义一个init函数：

init[] := Map[(LinkWriteCounter[#] = 0; LinkReadCounter[#] = 0) &, Links[]]

接下来，您希望在读取或写入其链接时递增这些计数器：

Unprotect[Message];
Message[LinkWrite::trace, x_, y_] := LinkWriteCounter[x[[1, 1]]] += 1;
Message[LinkRead::trace, x_, y_] := LinkReadCounter[x[[1, 1]]] += 1;
Protect[Message];

此处，x[[1,1]]是相关的LinkObject。

现在，打开LinkWrite和LinkRead上的跟踪：

On[LinkWrite];
On[LinkRead];

要格式化进度显示，首先要稍微缩短LinkObject显示，因为它们相当冗长：

Format[LinkObject[k_, a_, b_]] := Kernel[a, b]

这是一种为子内核链接动态显示读写的方法：

init[];
Dynamic[Grid[Join[
  {{"Kernel", "Writes", "Reads"}}, 
  Map[{#, LinkWriteCounter[#]/2, LinkReadCounter[#]/2} &, 
  Select[Links[], StringMatchQ[First[#], "*subkernel*"] &
]]], Frame -> All]]

（我将计数除以2，因为每次读取和写入的链接都被跟踪了两次）。

最后用10,000个元素表测试它：

init[];
ParallelTable[i, {i, 10^4}, Method -> "FinestGrained"];

如果一切正常，您应该看到最终进度显示，每个内核大约有5,000次读写：

Screen shot of the kernel session

对此有中等性能损失：没有显示器的10.73秒，以及显示器的13.69秒。当然，使用“FinestGrained”选项并不是用于此特定并行计算的最佳方法。

Answer 4

您可以从Yuri Kandrashkin开发的Spin`System`LoopControl`包中获得一些想法：

screenshot from Spin Algebra home

Announce of the Spin` package:

Hi group,

I have prepared the package Spin` that consists of several applications
which are designed for research in the area of magnetic resonance and 
spin chemistry and physics.

The applications Unit` and LoopControl` can be useful to a broader
audience.

The package and short outline is available at:
http://sites.google.com/site/spinalgebra/.

Sincerely,
Yuri Kandrashkin.

监控Mathematica中并行计算的进度

4 个答案: