如何使用omp simd向量化内部循环

时间:2014-04-01 11:31:34

标签: fortran openmp nested-loops simd

我有3个嵌套循环:

      !$omp parallel do schedule(runtime) private(s1)
      DO  k = 0, z
         !$omp simd collapse( 2 ) reduction( +: s1 )
         DO  i = 0, x
            DO  j =  0, z
               s1 = s1 + array(k,j,i)
            ENDDO
         ENDDO
         sums_l(k) = s1
      ENDDO

但是英特尔编译器抱怨"警告#13379:循环没有用" simd"" 这是为什么?我该怎么做呢?

// EDIT3:这是产生错误的代码。它被减少到仍然导致错误的最小值。如果你删除任何字面意思,它会矢量化。

SUBROUTINE simdTest

  IMPLICIT NONE

  INTEGER ::  i, j, k, sr, tn,nzb,nzt,nxl,nxr,nys,nyn
  REAL    ::  s1, s2, s3, s4
  REAL, DIMENSION(:,:,:), ALLOCATABLE :: u,v,pt,rmask,sums_l
  REAL, DIMENSION(:,:), ALLOCATABLE :: usws,vsws,shf

  !$omp parallel do schedule(runtime) private(s1,s2,s3)
  DO  k = nzb, nzt+1
    !$omp simd collapse( 2 ) reduction( +: s1, s2, s3 )
    DO  i = nxl, nxr
       DO  j =  nys, nyn
          s1 = s1 + u(k,j,i)  * rmask(j,i,sr)
          s2 = s2 + v(k,j,i)  * rmask(j,i,sr)
          s3 = s3 + pt(k,j,i) * rmask(j,i,sr)
       ENDDO
    ENDDO
    sums_l(k,1,tn) = s1
    sums_l(k,2,tn) = s2
    sums_l(k,4,tn) = s3
  ENDDO

  !$omp parallel do reduction( +: s1, s2, s3, s4) schedule(runtime)
  DO  i = nxl, nxr
   DO  j =  nys, nyn
      s1 = s1 + usws(j,i) * rmask(j,i,sr)
      s2 = s2 + vsws(j,i) * rmask(j,i,sr)
      s3 = s3 + shf(j,i)  * rmask(j,i,sr)
      s4 = s4 + 0.0
   ENDDO
  ENDDO
  sums_l(nzb,12,tn) = s1
  sums_l(nzb,14,tn) = s2
  sums_l(nzb,16,tn) = s3

END SUBROUTINE

1 个答案:

答案 0 :(得分:0)

评论中没有更多的地方:

当我在Ivy Bridge CPU上编译它时,我得到了这个。第15行上的循环无法在CPU上进行矢量化,但请注意它是针对Intel MIC架构的VECTORIZED。循环16在CPU上进行矢量化,同时删除了目标指令。

矢量化问题的原因在于第一个注释"下标过于复杂"。

ifort -openmp simd.f90 -warn -O3 -c -vec-report=3 -xHOST -fpp 
ifort: command line remark #10382: option '-xHOST' setting '-xCORE-AVX-I'
simd.f90(17): (col. 33) remark: loop was not vectorized: subscript too complex
simd.f90(15): (col. 5) warning #13379: loop was not vectorized with "simd"
simd.f90(16): (col. 8) remark: LOOP WAS VECTORIZED
simd.f90(13): (col. 3) remark: loop was not vectorized: not inner loop
simd.f90(13): (col. 3) remark: loop was not vectorized: not inner loop
simd.f90(31): (col. 4) remark: LOOP WAS VECTORIZED
simd.f90(30): (col. 3) remark: loop was not vectorized: not inner loop
simd.f90(29): (col. 7) remark: loop was not vectorized: not inner loop
simd.f90(29): (col. 7) remark: BLOCK WAS VECTORIZED
ifort: warning #10362: Environment configuration problem encountered.  Please check for proper MPSS installation and environment setup.
simd.f90(15): (col. 5) remark: *MIC* OpenMP SIMD LOOP WAS VECTORIZED
simd.f90(13): (col. 3) remark: *MIC* loop was not vectorized: not inner loop
simd.f90(13): (col. 3) remark: *MIC* loop was not vectorized: not inner loop
simd.f90(31): (col. 4) remark: *MIC* LOOP WAS VECTORIZED
simd.f90(31): (col. 4) remark: *MIC* PEEL LOOP WAS VECTORIZED
simd.f90(31): (col. 4) remark: *MIC* REMAINDER LOOP WAS VECTORIZED
simd.f90(30): (col. 3) remark: *MIC* loop was not vectorized: not inner loop
simd.f90(29): (col. 7) remark: *MIC* loop was not vectorized: not inner loop