加快读取和写入2D阵列

时间:2016-10-15 16:27:40

标签: c arrays performance loops optimization

我有一个网格,每个元素包含9个值。每个时间步都从相邻元素中获取一些值,进行一些微不足道的计算,然后将新值写回相同的地址。

在我的机器上,该程序在大约3分钟内运行。但是,一个简单地从相邻元素读取值,然后写回值的替代程序(即,没有中间计算的下面的程序)仅在50秒内运行。

假设中间计算不需要很长时间来计算(我可能错了),我怎样才能加快前一个程序以达到与后者相似的性能?问题最有可能与缓存有关,但我尝试的所有更改都不会对性能产生影响,或者使性能变差。

到目前为止我尝试了什么

将网格数组从数组结构(网格[9] [256 * 256])交换到结构数组(网格[256 * 256] [9])似乎对性能的影响可以忽略不计。

将a [9]数组提升到循环外部也似乎不会影响性能。

我通过分析器运行代码,它告诉我每当从网格访问元素时cpu性能都很差。

简化程序:

#include <stdio.h>
#include <stdlib.h>

int main() {
  double **grid = (double**)malloc(9*sizeof(double*));
  for(int i = 0; i < 9; i++)
    grid[i] = (double*)malloc(256*256*sizeof(double));

//  double **grid = (double**)malloc(256*256*sizeof(double*));
//  for(int i = 0; i < 256*256; i++)
//    grid[i] = (double*)malloc(9*sizeof(double));

  double res = 0.0;
  for (int tt = 0; tt < 80000; tt++) {
    for (int ii = 0; ii < 256; ii++) {
      for (int jj = 0; jj < 256; jj++) { 
        int up = (ii + 1) % 256;
        int rt = (jj + 1) % 256;
        int dn = (ii == 0) ? 255 : (ii - 1);
        int lf = (jj == 0) ? 255 : (jj - 1);

        double sum = grid[0][ii*256 + jj] + grid[1][ii*256 + lf]
                   + grid[2][dn*256 + jj] + grid[3][ii*256 + rt] 
                   + grid[4][up*256 + jj] + grid[5][dn*256 + lf]
                   + grid[6][dn*256 + rt] + grid[7][up*256 + rt] 
                   + grid[8][up*256 + lf];

        double odd = (   grid[1][ii*256 + jj] + grid[3][up*256 + lf]
                       + grid[5][dn*256 + rt] + grid[7][up*256 + rt]
                     ) / sum;

        double even = (   grid[0][ii*256 + jj] + grid[2][up*256 + lf] 
                        + grid[4][dn*256 + rt] + grid[6][dn*256 + lf] 
                        + grid[8][ii*256 + lf]
                      ) / sum;

        double hypot = odd*odd + even*even;

        double a[9];
        a[1] = (   odd        ) * hypot;
        a[2] = (         even ) * hypot;
        a[3] = ( - odd        ) * hypot;  
        a[4] = (       - even ) * hypot;
        a[5] = (   odd + even ) * hypot;
        a[6] = ( - odd + even ) * hypot;
        a[7] = ( - odd - even ) * hypot;
        a[8] = (   odd - even ) * hypot; 

        sum = 0.0;
        sum += ( grid[0][ii*256 + jj] = hypot * grid[0][ii*256 + jj] );
        sum += ( grid[1][ii*256 + lf] = a[3]  * grid[3][ii*256 + rt] );
        sum += ( grid[2][dn*256 + jj] = a[4]  * grid[4][up*256 + jj] );
        sum += ( grid[3][ii*256 + rt] = a[1]  * grid[1][ii*256 + lf] );
        sum += ( grid[4][up*256 + jj] = a[2]  * grid[2][dn*256 + jj] );
        sum += ( grid[5][dn*256 + lf] = a[7]  * grid[7][up*256 + rt] );
        sum += ( grid[6][dn*256 + rt] = a[8]  * grid[8][up*256 + lf] );
        sum += ( grid[7][up*256 + rt] = a[5]  * grid[5][dn*256 + lf] );
        sum += ( grid[8][up*256 + lf] = a[6]  * grid[6][dn*256 + rt] );

        res += sum;
      }
    }
  }

  printf("%f", res);
  return 0;
}

Vim命令交换程序中所有2D数组的索引顺序:

:%s/\[\([^\]]\+\)\]\[\([^\]]\+\)]/\[\2\]\[\1\]/g

0 个答案:

没有答案