Question

我正在尝试向量化一些C ++代码，但GCC 4.8.4不会这样做。

我正在使用标记-mavx -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization。还尝试使用-O2 -ftree-vectorize代替-O3但结果相同。

以下是-ftree-vectorizer-verbose=7的输出：

Analyzing loop at test.C:373

test.C:373: note: ===== analyze_loop_nest =====
test.C:373: note: === vect_analyze_loop_form ===
test.C:373: note: === get_loop_niters ===
test.C:373: note: ==> get_loop_niters:(unsigned int) prephitmp_2594
test.C:373: note: Symbolic number of iterations is (unsigned int) prephitmp_2594
test.C:373: note: === vect_analyze_data_refs ===

test.C:373: note: get vectype with 4 units of type const value_type
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type double
test.C:373: note: vectype: vector(4) double
test.C:373: note: not vectorized: not suitable for gather load _199 = MEM[(double &)_407];

test.C:373: note: bad data references.
test.C:373: note: ***** Re-trying analysis with vector size 16

test.C:373: note: === vect_analyze_loop_form ===
test.C:373: note: === get_loop_niters ===
test.C:373: note: ==> get_loop_niters:(unsigned int) prephitmp_2594
test.C:373: note: Symbolic number of iterations is (unsigned int) prephitmp_2594
test.C:373: note: === vect_analyze_data_refs ===

test.C:373: note: get vectype with 2 units of type const value_type
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type double
test.C:373: note: vectype: vector(2) double
test.C:373: note: not vectorized: not suitable for gather load _199 = MEM[(double &)_407];

test.C:373: note: bad data references.

“不适合聚集负荷”是什么意思？此外，所有“获取vectype”和“vectype”的含义是什么？

循环访问std::vector<>的数据，并从头到尾按顺序访问元素。我认为没有理由不这样做，我无法弄清楚GCC试图告诉我什么。

我知道我没有给出实际的代码本身，所以我不指望任何修复它的帮助。我只是想知道如何解析GCC的输出。

更新

英特尔将在不使用编译指示的情况下对此循环进行矢量化

更新2 我的代码的基本思想是我有一个2D c风格的数组，我循环并使用存储的值做一些基本的算术。我编写了一个使用一维c风格数组作为基础数据结构的类，并重载[][]运算符以模仿相同的功能。 1D数组以适当的步幅存储数据，因此以适当的c风格迭代2D版本将在1D中产生相同的迭代。我个人使用一个小数据集验证了数据存储的准确性，因此功能和底层存储是正确的，应该进行矢量化。我不明白为什么没有，我需要帮助解码GCC的输出

解释GCC矢量化报告

0 个答案: