我是OpenMP的新手,我正致力于代码优化工作。 以下功能
int accelerate_flow(const t_param params, t_speed* cells, int* obstacles)
{
int ii,jj; /* generic counters */
double w1,w2; /* weighting factors */
/* compute weighting factors */
w1 = params.density * params.accel / 9.0;
w2 = params.density * params.accel / 36.0;
/* modify the 2nd row of the grid */
ii=params.ny - 2;
for(jj=0;jj<params.nx;jj++) {
/* if the cell is not occupied and
** we don't send a density negative */
if( !obstacles[ii*params.nx + jj] &&
(cells[ii*params.nx + jj].speeds[3] - w1) > 0.0 &&
(cells[ii*params.nx + jj].speeds[6] - w2) > 0.0 &&
(cells[ii*params.nx + jj].speeds[7] - w2) > 0.0 ) {
/* increase 'east-side' densities */
cells[ii*params.nx + jj].speeds[1] += w1;
cells[ii*params.nx + jj].speeds[5] += w2;
cells[ii*params.nx + jj].speeds[8] += w2;
/* decrease 'west-side' densities */
cells[ii*params.nx + jj].speeds[3] -= w1;
cells[ii*params.nx + jj].speeds[6] -= w2;
cells[ii*params.nx + jj].speeds[7] -= w2;
}
}
return EXIT_SUCCESS;
}
这是原点,当我想像并行
那样优化for循环时#pragma omp parallel for private(jj) shared(cells)
for(jj =0<jj<params.nx,jj++)
.....
它变得非常慢。我不知道如何更好地并行化。