在clang中矢量化一个函数

时间:2016-05-20 16:11:59

标签: c++ vector simd clang++

我正在尝试根据此clang reference使用clang对以下函数进行矢量化。它采用字节数组的向量,并根据this RFC应用掩码。

static void apply_mask(vector<uint8_t> &payload, uint8_t (&masking_key)[4]) {
  #pragma clang loop vectorize(enable) interleave(enable)
  for (size_t i = 0; i < payload.size(); i++) {
    payload[i] = payload[i] ^ masking_key[i % 4];
  }
}

以下标志传递给clang:

-O3
-Rpass=loop-vectorize
-Rpass-analysis=loop-vectorize

但是,矢量化失败并出现以下错误:

WebSocket.cpp:5:
WebSocket.h:14:
In file included from boost/asio/io_service.hpp:767:
In file included from boost/asio/impl/io_service.hpp:19:
In file included from boost/asio/detail/service_registry.hpp:143:
In file included from boost/asio/detail/impl/service_registry.ipp:19:
c++/v1/vector:1498:18: remark: loop not vectorized: could not determine number
      of loop iterations [-Rpass-analysis]
    return this->__begin_[__n];
                 ^
c++/v1/vector:1498:18: error: loop not vectorized: failed explicitly specified
      loop vectorization [-Werror,-Wpass-failed]

如何对此for循环进行矢量化?

1 个答案:

答案 0 :(得分:4)

感谢@PaulR和@PeterCordes。将循环展开4倍就可以了。

void apply_mask(vector<uint8_t> &payload, const uint8_t (&masking_key)[4]) {
  const size_t size = payload.size();
  const size_t size4 = size / 4;
  size_t i = 0;
  uint8_t *p = &payload[0];
  uint32_t *p32 = reinterpret_cast<uint32_t *>(p);
  const uint32_t m = *reinterpret_cast<const uint32_t *>(&masking_key[0]);

#pragma clang loop vectorize(enable) interleave(enable)
  for (i = 0; i < size4; i++) {
    p32[i] = p32[i] ^ m;
  }

  for (i = (size4*4); i < size; i++) {
    p[i] = p[i] ^ masking_key[i % 4];
  }
}

gcc.godbolt code