我有一个网格,每个元素包含9个值。每个时间步都从相邻元素中获取一些值,进行一些微不足道的计算,然后将新值写回相同的地址。
在我的机器上,该程序在大约3分钟内运行。但是,一个简单地从相邻元素读取值,然后写回值的替代程序(即,没有中间计算的下面的程序)仅在50秒内运行。
假设中间计算不需要很长时间来计算(我可能错了),我怎样才能加快前一个程序以达到与后者相似的性能?问题最有可能与缓存有关,但我尝试的所有更改都不会对性能产生影响,或者使性能变差。
到目前为止我尝试了什么
将网格数组从数组结构(网格[9] [256 * 256])交换到结构数组(网格[256 * 256] [9])似乎对性能的影响可以忽略不计。
将a [9]数组提升到循环外部也似乎不会影响性能。
我通过分析器运行代码,它告诉我每当从网格访问元素时cpu性能都很差。
简化程序:
#include <stdio.h>
#include <stdlib.h>
int main() {
double **grid = (double**)malloc(9*sizeof(double*));
for(int i = 0; i < 9; i++)
grid[i] = (double*)malloc(256*256*sizeof(double));
// double **grid = (double**)malloc(256*256*sizeof(double*));
// for(int i = 0; i < 256*256; i++)
// grid[i] = (double*)malloc(9*sizeof(double));
double res = 0.0;
for (int tt = 0; tt < 80000; tt++) {
for (int ii = 0; ii < 256; ii++) {
for (int jj = 0; jj < 256; jj++) {
int up = (ii + 1) % 256;
int rt = (jj + 1) % 256;
int dn = (ii == 0) ? 255 : (ii - 1);
int lf = (jj == 0) ? 255 : (jj - 1);
double sum = grid[0][ii*256 + jj] + grid[1][ii*256 + lf]
+ grid[2][dn*256 + jj] + grid[3][ii*256 + rt]
+ grid[4][up*256 + jj] + grid[5][dn*256 + lf]
+ grid[6][dn*256 + rt] + grid[7][up*256 + rt]
+ grid[8][up*256 + lf];
double odd = ( grid[1][ii*256 + jj] + grid[3][up*256 + lf]
+ grid[5][dn*256 + rt] + grid[7][up*256 + rt]
) / sum;
double even = ( grid[0][ii*256 + jj] + grid[2][up*256 + lf]
+ grid[4][dn*256 + rt] + grid[6][dn*256 + lf]
+ grid[8][ii*256 + lf]
) / sum;
double hypot = odd*odd + even*even;
double a[9];
a[1] = ( odd ) * hypot;
a[2] = ( even ) * hypot;
a[3] = ( - odd ) * hypot;
a[4] = ( - even ) * hypot;
a[5] = ( odd + even ) * hypot;
a[6] = ( - odd + even ) * hypot;
a[7] = ( - odd - even ) * hypot;
a[8] = ( odd - even ) * hypot;
sum = 0.0;
sum += ( grid[0][ii*256 + jj] = hypot * grid[0][ii*256 + jj] );
sum += ( grid[1][ii*256 + lf] = a[3] * grid[3][ii*256 + rt] );
sum += ( grid[2][dn*256 + jj] = a[4] * grid[4][up*256 + jj] );
sum += ( grid[3][ii*256 + rt] = a[1] * grid[1][ii*256 + lf] );
sum += ( grid[4][up*256 + jj] = a[2] * grid[2][dn*256 + jj] );
sum += ( grid[5][dn*256 + lf] = a[7] * grid[7][up*256 + rt] );
sum += ( grid[6][dn*256 + rt] = a[8] * grid[8][up*256 + lf] );
sum += ( grid[7][up*256 + rt] = a[5] * grid[5][dn*256 + lf] );
sum += ( grid[8][up*256 + lf] = a[6] * grid[6][dn*256 + rt] );
res += sum;
}
}
}
printf("%f", res);
return 0;
}
Vim命令交换程序中所有2D数组的索引顺序:
:%s/\[\([^\]]\+\)\]\[\([^\]]\+\)]/\[\2\]\[\1\]/g