使用Fortran 90进行OpenMP和数组求和

时间:2012-05-30 06:51:30

标签: openmp multidimensional-array fortran90 do-loops

我正试图计算晶体结构的压力张量。 要做到这一点,我必须像下面的简化代码一样完成所有粒子对

  do i=1, atom_number      ! sum over atoms i
   type1 = ATOMS(i)
   do nj=POINT(i), POINT(i+1)-1  ! sum over atoms j of i's atoms list
       j = LIST(nj)
       type2 = ATOMS(j) 
       call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
       call distance_sqr2(i,j,r,VECT_R)
       call gettensor_rij(i,j,T)
       r = sqrt(r)
       if (r<coub_cutoff) then
          local_sum_real(id+1) = local_sum_real(id+1) + ( erfc(alpha_ewald*r) 
          derivative = -( erfc(alpha_ewald*r)
          ff = - (1.0d0/r)*derivative  
          STRESS_EWALDD = STRESS_EWALDD + ff * T  ! A 3x3 matrix    
          FDX(i) = FDX(i) + VECT_R(1) * ff
          FDY(i) = FDY(i) + VECT_R(2) * ff
          FDZ(i) = FDZ(i) + VECT_R(3) * ff
          FDX(j) = FDX(j) - VECT_R(1) * ff
          FDY(j) = FDY(j) - VECT_R(2) * ff
          FDZ(j) = FDZ(j) - VECT_R(3) * ff     
       end if                                  
   end do               ! sum over atoms j 

   sum_kin = sum_kin + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)  
   call gettensor_v(i,Q)
   STRESS_KINETIC = STRESS_KINETIC + ATMMASS(i) * Q

 end do 

我尝试将这个双循环与“paralle do”指令并行化但是在张量STRESS_EWALDD和强制FDX上遇到了一些问题.... 因此,我尝试为每个线程分配manualy一些像下面的代码中的粒子,但仍然得到错误的张量值。

!$omp parallel  private(id,j,i,nj,type1,type2,T,Q,r,VECT_R,derivative,Aab,Bab,Cab,Dab,ff)

 id = omp_get_thread_num()      
 do i=id+1, atom_number,nthreads       ! sum over atoms i
       type1 = ATOMS(i)
       do nj=POINT(i), POINT(i+1)-1  ! sum over atoms j of i's atoms list
           j = LIST(nj)
           type2 = ATOMS(j) 
           call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
           call distance_sqr2(i,j,r,VECT_R)
           call gettensor_rij(i,j,T)
           r = sqrt(r)
           if (r<coub_cutoff) then
              local_sum_real(id+1) = local_sum_real(id+1) + ( erfc(alpha_ewald*r) 
              derivative = -( erfc(alpha_ewald*r)
              ff = - (1.0d0/r)*derivative  
          local_STRESS_EWALDD(id+1,:,:) = local_STRESS_EWALDD(id+1,:,:) + ff * T    
          FDX(i) = FDX(i) + VECT_R(1) * ff
              FDY(i) = FDY(i) + VECT_R(2) * ff
              FDZ(i) = FDZ(i) + VECT_R(3) * ff
              FDX(j) = FDX(j) - VECT_R(1) * ff
              FDY(j) = FDY(j) - VECT_R(2) * ff
              FDZ(j) = FDZ(j) - VECT_R(3) * ff     
           end if                                  
       end do               ! sum over atoms j 

   local_sum_kin(id+1) = local_sum_kin(id+1) + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)  
   call gettensor_v(i,Q)
   local_STRESS_KINETIC(id+1,:,:) = local_STRESS_KINETIC(id+1,:,:) + ATMMASS(i) * Q

 end do               ! sum over atoms i

!$omp end parallel

 do i=1,nthreads  
   sum_real = sum_real + local_sum_real(i)
   sum_virial_buck = sum_virial_buck + local_sum_virial_buck(i)
   STRESS_BUCK = STRESS_BUCK + local_STRESS_BUCK(i,:,:)
   STRESS_EWALDD = STRESS_EWALDD + local_STRESS_EWALDD(i,:,:)   
   sum_buck = sum_buck + local_sum_buck(i)
   sum_kin  = sum_kin + local_sum_kin(i)
   STRESS_KINETIC = STRESS_KINETIC + local_STRESS_KINETIC(i,:,:) 
 end do 

Scalars和STRESS_KINETIC值是正确的,但STRESS_EWALDD是错误的,我无法弄清楚原因。 到目前为止我还不知道力量。 所以我真的很感激在这里点击。 谢谢,

      Éric.

1 个答案:

答案 0 :(得分:2)

您采用了一种非常规的方法来使用OpenMP。

OpenMP提供了reduction(op:vars)子句,该op子句使用varsSTRESS_EWALD中变量的本地值执行缩减,您应该将其用于sum_kinsum_realSTRESS_KINETICPOINT i -th atom的强制累积应该有效,因为Verlet列表!$OMP PARALLEL DO PRIVATE(....) !$OMP& REDUCTION(+:FDX,FDY,FDZ,sum_kin,sum_real,STRESS_EWALDD,STRESS_KINETIC) !$OMP& SCHEDULE(DYNAMIC) do i=1, atom_number ! sum over atoms i type1 = ATOMS(i) do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list j = LIST(nj) type2 = ATOMS(j) call get_ff_param(type1,type2,Aab,Bab,Cab,Dab) call distance_sqr2(i,j,r,VECT_R) call gettensor_rij(i,j,T) r = sqrt(r) if (r<coub_cutoff) then sum_real = sum_real + ( erfc(alpha_ewald*r) derivative = -( erfc(alpha_ewald*r) ff = - (1.0d0/r)*derivative STRESS_EWALDD = STRESS_EWALDD + ff * T ! A 3x3 matrix FDX(i) = FDX(i) + VECT_R(1) * ff FDY(i) = FDY(i) + VECT_R(2) * ff FDZ(i) = FDZ(i) + VECT_R(3) * ff FDX(j) = FDX(j) - VECT_R(1) * ff FDY(j) = FDY(j) - VECT_R(2) * ff FDZ(j) = FDZ(j) - VECT_R(3) * ff end if end do ! sum over atoms j sum_kin = sum_kin + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2) call gettensor_v(i,Q) STRESS_KINETIC = STRESS_KINETIC + ATMMASS(i) * Q end do !$OMP END PARALLEL DO 中的原子是有序的(你从Allen&amp; Tildesley的书中获取构建它的代码,对吧?)但不适用于 j -th原子。这就是你应该对它们进行减少的原因。如果您在一些较旧的OpenMP手册中读到减少变量必须是标量,请不要担心 - 较新的OpenMP版本支持在Fortran 90 +中减少数组变量。

{{1}}

当每个原子的邻居数量变化很大时,使用动态循环调度将减少负载不平衡。