我正在尝试向量化一些C ++代码,但GCC 4.8.4不会这样做。
我正在使用标记-mavx -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization
。还尝试使用-O2 -ftree-vectorize
代替-O3
但结果相同。
以下是-ftree-vectorizer-verbose=7
的输出:
Analyzing loop at test.C:373
test.C:373: note: ===== analyze_loop_nest =====
test.C:373: note: === vect_analyze_loop_form ===
test.C:373: note: === get_loop_niters ===
test.C:373: note: ==> get_loop_niters:(unsigned int) prephitmp_2594
test.C:373: note: Symbolic number of iterations is (unsigned int) prephitmp_2594
test.C:373: note: === vect_analyze_data_refs ===
test.C:373: note: get vectype with 4 units of type const value_type
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type const double
test.C:373: note: vectype: const vector(4) double
test.C:373: note: get vectype with 4 units of type double
test.C:373: note: vectype: vector(4) double
test.C:373: note: not vectorized: not suitable for gather load _199 = MEM[(double &)_407];
test.C:373: note: bad data references.
test.C:373: note: ***** Re-trying analysis with vector size 16
test.C:373: note: === vect_analyze_loop_form ===
test.C:373: note: === get_loop_niters ===
test.C:373: note: ==> get_loop_niters:(unsigned int) prephitmp_2594
test.C:373: note: Symbolic number of iterations is (unsigned int) prephitmp_2594
test.C:373: note: === vect_analyze_data_refs ===
test.C:373: note: get vectype with 2 units of type const value_type
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type const double
test.C:373: note: vectype: const vector(2) double
test.C:373: note: get vectype with 2 units of type double
test.C:373: note: vectype: vector(2) double
test.C:373: note: not vectorized: not suitable for gather load _199 = MEM[(double &)_407];
test.C:373: note: bad data references.
“不适合聚集负荷”是什么意思?此外,所有“获取vectype”和“vectype”的含义是什么?
循环访问std::vector<>
的数据,并从头到尾按顺序访问元素。我认为没有理由不这样做,我无法弄清楚GCC试图告诉我什么。
我知道我没有给出实际的代码本身,所以我不指望任何修复它的帮助。我只是想知道如何解析GCC的输出。
更新
英特尔将在不使用编译指示的情况下对此循环进行矢量化
更新2
我的代码的基本思想是我有一个2D c风格的数组,我循环并使用存储的值做一些基本的算术。我编写了一个使用一维c风格数组作为基础数据结构的类,并重载[][]
运算符以模仿相同的功能。 1D数组以适当的步幅存储数据,因此以适当的c风格迭代2D版本将在1D中产生相同的迭代。我个人使用一个小数据集验证了数据存储的准确性,因此功能和底层存储是正确的,应该进行矢量化。我不明白为什么没有,我需要帮助解码GCC的输出