Question

我试图改进parallel.For在我的主要部分代码中。这个循环有主要的计算，每个输出需要完成超过100万次（我需要8000万输出）。因此，任何改进甚至可能会对执行时间产生严重影响。我知道IF条件会降低并行计算速度。另外，我知道特殊地方的主要变量（U [i，j]和V [i，j]）总是为零。因此，如果我可以为这些数组的特殊列分配常数零（不想改变计算），我可以从代码中消除IF条件。

Before calculation:
| 1 1 1 0 1|
| 1 1 1 0 1|
| 1 1 1 0 1|
| 1 1 1 0 1|
After calculation:
| 3 1 8 0 5|
| 1 4 4 0 1|
| 7 3 1 0 8|
| 1 1 5 0 7|

我希望有一个列，其值始终为零。

如何为2D数组的空间列分配常数（零）？

作为样本，上述部分如下：

double[,] U= new double[nx,ny];
double[,] V= new double[nx,ny];

Parallel.For(0,nx,i =>
{
   For (j=0; j<ny ; j++)
   {
     if (i!=a && i!=b &&i!=c &&i!=d &&)
     {
       U[i,j]= ...; // A big chunk of calculations
       V[i,j]=... ;// A big chunk of calculations
     }
  }
}

有趣的是，当我运行代码时，我发现它占据了所有内核的近20％。是因为我的弱并行循环还是我应该手动分配循环使用的核心数？

Answer 1

这无法改善吗？

Parallel.For(0,nx,i =>
{
   if (i!=a && i!=b &&i!=c &&i!=d &&)
   {
       For (j=0; j<ny ; j++)
       {
           U[i,j]= ...; // A big chunk of calculations
           V[i,j]=... ;// A big chunk of calculations
       }
   }
}

仅当i没有满足条件时才评估内循环。否则你就会陷入忙碌的等待中。

Answer 2

在单独的内核中计算边框，因为只有＃34;如果＆＃34;条款。然后计算内部，如果没有任何cnodition。预计加速2倍。

//interior (dont include borders)
Parallel.For(1,nx-1,i =>
{

       For (j=1; j<ny-1 ; j++)
       {
           U[i,j]= ...; // A big chunk of calculations
           V[i,j]=... ;// A big chunk of calculations
       } 

}

//exterior 1
Parallel.For(xx,xx1,i =>
{
   //another calculation
}

//exterior 2
Parallel.For(xx1,yy,i =>
{
   //another calculation
}

//exterior 3
Parallel.For(yy,yy1,i =>
{
   //another calculation
}

//exterior 4
Parallel.For(yy1,xx,i =>
{
   //another calculation
}

使用内部循环的C ++ dll可以为gpgpu提供10倍以上的加速（SIMD）甚至opencl - ＆gt;加速30倍。

是否可以在2D数组的空间列中具有常数？

2 个答案: