我正在尝试使用OpenMP并行化此代码。
for(t_step=0;t_step<Ntot;t_step++) {
// current row
if(cur_row + 1 < Npt_x) cur_row++;
else cur_row = 0;
// get data from file which update only the row "cur_row" of array val
read_line(f_u, val[cur_row]);
// computes
for(i=0;i<Npt_x;i++) {
for(j=0;j<Npt_y;j++) {
i_corrected = cur_row - i;
if(i_corrected < 0) i_corrected = Npt_x + i_corrected;
R[i][j] += val[cur_row][0]*val[i_corrected][j]/Ntot;
}
}
}
与
- val和R声明为** double,
- Npt_x和Npt_y约为500,
- Ntot大概是10 ^ 6。
我已经完成了这个
for(t_step=0;t_step<Ntot;t_step++) {
// current row
if(cur_row + 1 < Npt_x) cur_row++;
else cur_row = 0;
// get data from file which update only the row "cur_row" of array val
read_line(f_u, val[cur_row]);
// computes
#pragma omp parallel for collapse(2), private(i,j,i_corrected)
for(i=0;i<Npt_x;i++) {
for(j=0;j<Npt_y;j++) {
i_corrected = cur_row - i;
if(i_corrected < 0) i_corrected = Npt_x + i_corrected;
R[i][j] += val[cur_row][0]*val[i_corrected][j]/Ntot;
}
}
}
问题是它看起来效率不高。在这种情况下,有没有办法更有效地使用OpenMP?
很多人
答案 0 :(得分:1)
现在,我会尝试这样的事情:
for(t_step=0;t_step<Ntot;t_step++) {
// current row
if(cur_row + 1 < Npt_x)
cur_row++;
else
cur_row = 0;
// get data from file which update only the row "cur_row" of array val
read_line(f_u, val[cur_row]);
// computes
#pragma omp parallel for private(i,j,i_corrected)
for(i=0;i<Npt_x;i++) {
i_corrected = cur_row - i;
if(i_corrected < 0)
i_corrected += Npt_x;
double tmp = val[cur_row][0]/Ntot;
#if defined(_OPENMP) && _OPENMP > 201306
#pragma omp simd
#endif
for(j=0;j<Npt_y;j++) {
R[i][j] += tmp*val[i_corrected][j];
}
}
}
但是,由于代码将受内存限制,因此不确定它是否能让您获得更多的并行加速......值得一试。