Question

不幸的是，英特尔编译器无法对以下i和j循环进行矢量化：

#pragma ivdep
#pragma vector always
for (  i = 1 ; i < N ; i++ ){ // <- not vectorized
    for (  j = 0 ; j < i ; j++ ){ // <- not vectorized
        // Matrix multiplication on C = x(i) * x(j)
        cblas_dgemm(...,&x[i*BLOCKSIZE*D],... , &x[j*BLOCKSIZE*D],..., c);
        int accum=0;

        // following loop gets vectorized well
        #pragma omp parallel for reduction(+:accum) collapse(2)
        for ( int k = 0 ; k < BLOCKSIZE ; k++ ){
            for ( int l =0 ; l < BLOCKSIZE ; l++ ){                    
                    accum +=  C[k * NRC + l] + p[j*BLOCKSIZE + l] + p[i*BLOCKSIZE+k];
            }
        }

        total += accum;
    }
}

矢量化报告说：

LOOP BEGIN at i-th loop:
   remark #15521: loop was not vectorized: loop control variable was not identified. Explicitly compute the iteration count before executing the loop or try using canonical loop form from OpenMP specification

   LOOP BEGIN at j-th loop:
      remark #15521: loop was not vectorized: loop control variable was not identified. Explicitly compute the iteration count before executing the loop or try using canonical loop form from OpenMP specification
   LOOP END
LOOP END

我真的很困惑，因为我认为控制变量i和j是显而易见的，我认为我有来自OpenMP规范的cannonical循环形式。 k - 和l - 循环可以正常工作。有什么猜测吗？

循环未矢量化：未识别循环控制变量

0 个答案: