C:Vectorize for Loops

时间:2017-02-28 09:12:13

标签: c vectorization mathematical-optimization

我的代码的关键部分有以下两个循环。第一个是将复数向量B(维度:N)与复数矩阵out1(维度:NxJ)相乘,并将结果存储在inc(维度:NxJ)中。第二个循环将复矩阵out2(维度:NxJ)转换为幅度和相位部分,并将其连续存储在t(维度Nx2J)中。 incout1out1B都属于fftw_complex类型(2 double),而t是浮点数

for (int i = 0; i < N * J; i++)
{
    k = i % N;
    inc[i][REAL] = out1[k][REAL] * B[i][REAL] - out1[k][IMAG] * B[i][IMAG];
    inc[i][IMAG] = out1[k][REAL] * B[i][IMAG] + out1[k][IMAG] * B[i][REAL];
}

for (int i = 0; i < N * J; i++)
{
    t[i]         = (float) sqrt(out2[i][REAL] * out2[i][REAL] 
                             +  out2[i][IMAG] * out2[i][IMAG]);
    t[N * J + i] = (float) atan2(out2[i][IMAG], out2[i][REAL]);
}

使用:-Ofast -ftree-vectorize -fopt-info-vec-missed -mavx2 -msse4编译时,循环1的输出为:

note: not vectorized: not suitable for gather load _50 = *_49[0];
note: bad data references.
note: not vectorized: not enough data-refs in basic block.
note: not consecutive access _50 = *_49[0];
note: Build SLP failed: unrolling required in basic block SLP
note: not consecutive access _50 = *_49[0];
note: Build SLP failed: unvectorizable statement _50 = *_49[0];
note: Build SLP failed: different interleaving chains in one node _60 = *_49[0];

,循环2的输出为:

note: versioning for alias required: can't determine dependence between *_70 and *_84
note: vector alignment may not be reachable
note: virtual phi. skip.
note: num. args = 4 (not unary/binary/ternary op).
note: not ssa-name.
note: use not simple.
note: no array mode for V4DF[2]
note: num. args = 4 (not unary/binary/ternary op).
note: not ssa-name.
note: use not simple.
note: no array mode for V4DF[2]
note: function is not vectorizable.
note: not vectorized: relevant stmt not supported: _85 = atan2 (_75, _73);
note: bad operation or unsupported loop bound.
note: versioning for alias required: can't determine dependence between *_70 and *_84
note: vector alignment may not be reachable
note: virtual phi. skip.
note: num. args = 4 (not unary/binary/ternary op).
note: not ssa-name.
note: use not simple.
note: no array mode for V2DF[2]
note: num. args = 4 (not unary/binary/ternary op).
note: not ssa-name.
note: use not simple.
note: no array mode for V2DF[2]
note: function is not vectorizable.
note: not vectorized: relevant stmt not supported: _85 = atan2 (_75, _73);
note: bad operation or unsupported loop bound.
note: not vectorized: no grouped stores in basic block.

我发现这些循环是我代码中的瓶颈。我如何对它们进行矢量化?

1 个答案:

答案 0 :(得分:1)

我可编辑的代码版本是

#include <math.h>

typedef double complex[2];
static const int REAL = 0;
static const int IMAG = 1;

void loop1(int N, int J, const complex B[], const complex out1[], complex inc[])
{
    const int NJ = N * J;
    for (int i = 0; i < NJ; ++i) {
        const int k = i % N;
        inc[i][IMAG] = out1[k][REAL] * B[i][IMAG] + out1[k][IMAG] * B[i][REAL];
        inc[i][REAL] = out1[k][REAL] * B[i][REAL] - out1[k][IMAG] * B[i][IMAG];
    }
}

void loop2(int N, int J, float t[], const complex out2[])
{
    const int NJ = N * J;
    float *const p = t + NJ;
    for (int i = 0; i < NJ; ++i) {
        /*t[i] = (float) hypot(out2[i][REAL], out2[i][IMAG]);*/
        t[i] = (float) sqrt(out2[i][REAL] * out2[i][REAL] + out2[i][IMAG] * out2[i][IMAG]);
        p[i] = (float) atan2(out2[i][IMAG], out2[i][REAL]);
    }
}

对于第一个循环,我得到:

42504487.c:10:5: note: not vectorized: not suitable for gather load _16 = *_15[0];
42504487.c:10:5: note: bad data references.
42504487.c:10:5: note: not vectorized: not enough data-refs in basic block.
42504487.c:15:1: note: not vectorized: not enough data-refs in basic block.
42504487.c:10:5: note: Two or more load stmts share the same dr.
42504487.c:10:5: note: Two or more load stmts share the same dr.
42504487.c:10:5: note: Build SLP failed: unrolling required in basic block SLP
42504487.c:10:5: note: Two or more load stmts share the same dr.
42504487.c:10:5: note: Two or more load stmts share the same dr.
42504487.c:10:5: note: can't determine dependence between *_11[1] and *_15[1]

对于第二个循环,我得到:

42504487.c:21:5: note: versioning for alias required: can't determine dependence between *_13 and *_25
42504487.c:21:5: note: vector alignment may not be reachable
42504487.c:21:5: note: virtual phi. skip.
42504487.c:21:5: note: num. args = 4 (not unary/binary/ternary op).
42504487.c:21:5: note: not ssa-name.
42504487.c:21:5: note: use not simple.
42504487.c:21:5: note: no array mode for V4DF[2]
42504487.c:21:5: note: num. args = 4 (not unary/binary/ternary op).
42504487.c:21:5: note: not ssa-name.
42504487.c:21:5: note: use not simple.
42504487.c:21:5: note: no array mode for V4DF[2]
42504487.c:21:5: note: function is not vectorizable.
42504487.c:21:5: note: not vectorized: relevant stmt not supported: _26 = atan2 (_19, _17);
42504487.c:21:5: note: bad operation or unsupported loop bound.
42504487.c:21:5: note: versioning for alias required: can't determine dependence between *_13 and *_25
42504487.c:21:5: note: vector alignment may not be reachable
42504487.c:21:5: note: virtual phi. skip.
42504487.c:21:5: note: num. args = 4 (not unary/binary/ternary op).
42504487.c:21:5: note: not ssa-name.
42504487.c:21:5: note: use not simple.
42504487.c:21:5: note: no array mode for V2DF[2]
42504487.c:21:5: note: num. args = 4 (not unary/binary/ternary op).
42504487.c:21:5: note: not ssa-name.
42504487.c:21:5: note: use not simple.
42504487.c:21:5: note: no array mode for V2DF[2]
42504487.c:21:5: note: function is not vectorizable.
42504487.c:21:5: note: not vectorized: relevant stmt not supported: _26 = atan2 (_19, _17);
42504487.c:21:5: note: bad operation or unsupported loop bound.
42504487.c:21:5: note: not vectorized: not enough data-refs in basic block.
42504487.c:26:1: note: not vectorized: not enough data-refs in basic block.
42504487.c:21:5: note: not vectorized: no grouped stores in basic block.

在这里,我们“没有V4DF的数组模式[2]”和“没有V2DF的数组模式[2]”,这表明我们没有合适的矢量化类型。

此外,“不支持相关的stmt:atan2”告诉我们没有atan2的矢量实现。

此时,如果有足够的可用内核,我会选择使用OpenMP,也许使用-floop-parallelize-all