OpenMP并行化效率不高

时间:2015-10-22 17:13:11

标签: c multithreading for-loop parallel-processing openmp

我正在尝试使用OpenMP并行化此代码。

for(t_step=0;t_step<Ntot;t_step++) {
        // current row
        if(cur_row + 1 < Npt_x)     cur_row++; 
        else                        cur_row = 0;
        // get data from file which update only the row "cur_row" of array val
        read_line(f_u, val[cur_row]);
        // computes
        for(i=0;i<Npt_x;i++) {
            for(j=0;j<Npt_y;j++) {
                i_corrected = cur_row - i;
                if(i_corrected < 0)     i_corrected = Npt_x + i_corrected;
                R[i][j] += val[cur_row][0]*val[i_corrected][j]/Ntot;
            }
        }
    }


  - val和R声明为** double,
  - Npt_x和Npt_y约为500,
  - Ntot大概是10 ^ 6。

我已经完成了这个

for(t_step=0;t_step<Ntot;t_step++) {
        // current row
        if(cur_row + 1 < Npt_x)     cur_row++; 
        else                        cur_row = 0;
        // get data from file which update only the row "cur_row" of array val
        read_line(f_u, val[cur_row]);
        // computes
        #pragma omp parallel for collapse(2), private(i,j,i_corrected)
        for(i=0;i<Npt_x;i++) {
            for(j=0;j<Npt_y;j++) {
                i_corrected = cur_row - i;
                if(i_corrected < 0)     i_corrected = Npt_x + i_corrected;
                R[i][j] += val[cur_row][0]*val[i_corrected][j]/Ntot;
            }
        }
    }

问题是它看起来效率不高。在这种情况下,有没有办法更有效地使用OpenMP?

很多人

1 个答案:

答案 0 :(得分:1)

现在,我会尝试这样的事情:

for(t_step=0;t_step<Ntot;t_step++) {
    // current row
    if(cur_row + 1 < Npt_x)
        cur_row++; 
    else
        cur_row = 0;
    // get data from file which update only the row "cur_row" of array val
    read_line(f_u, val[cur_row]);
    // computes
    #pragma omp parallel for private(i,j,i_corrected)
    for(i=0;i<Npt_x;i++) {
        i_corrected = cur_row - i;
        if(i_corrected < 0)
            i_corrected += Npt_x;
        double tmp = val[cur_row][0]/Ntot;
        #if defined(_OPENMP) && _OPENMP > 201306
        #pragma omp simd
        #endif
        for(j=0;j<Npt_y;j++) {
            R[i][j] += tmp*val[i_corrected][j];
        }
    }
}

但是,由于代码将受内存限制,因此不确定它是否能让您获得更多的并行加速......值得一试。