c ++:使用-O2优化编译大量表达式失败了吗?

时间:2017-05-20 15:32:06

标签: c++ gcc compiler-errors compilation

我使用Eigen库进行一些矩阵计算。我必须定义一个大矩阵(实际上不是那么大,只有300x300),每个元素由长复指数表达式组成。

为了给我的意思留下印象,我复制了矩阵定义的一小部分

#include <iostream>
#include <complex>
#include <Eigen/Dense>
using namespace Eigen;

int main()
{
typedef std::complex<double> cd;
MatrixXcd h(300,300);
double kx,ky;
kx=1.;
ky=1.;
h.setZero(300,300);
h(0,0)=cd(6.942755,0.) + 0.043986/exp(cd(0,1)*(0. - 2.0238820899708214*kx - 7.55323078829979*ky)) - 0.010802/exp(cd(0,1)*(0. + 5.529348698328969*kx - 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) + 0.043986/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky)) - 0.010802/exp(cd(0,1)*(0. - 5.529348698328969*kx + 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. + 2.0238820899708214*kx + 7.55323078829979*ky));
h(0,2)=cd(0.,0.) + 0.095916/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) - 0.131689/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky));
h(0,3)=cd(-0.10825,0.) - 0.011519/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky));
...
...//6000 more lines omitted here
}

我在windows上使用mingw-w64,编译器设置正常。但是当我用

编译上面的代码时
g++ -O2 code.cpp

编译失败,弹出对话框!

enter image description here

如果我仔细查看任务管理器,编译会在内存使用量停止时大约1GB。

但是,如果我再次使用-O0选项编译代码,即禁用所有优化,编译成功,即使内存使用量达到接近2GB的峰值。 所以失败的确定不是由于记忆

更重要的是,我可以确认此行为与Eigen库无关。即使我不使用Eigen并将所有分配替换为同一个变量,像这样

#include <iostream>
#include <complex>

int main()
{
typedef std::complex<double> cd;
cd tmp;
double kx,ky;
kx=1.;
ky=1.;
tmp=cd(6.942755,0.) + 0.043986/exp(cd(0,1)*(0. - 2.0238820899708214*kx - 7.55323078829979*ky)) - 0.010802/exp(cd(0,1)*(0. + 5.529348698328969*kx - 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) + 0.043986/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky)) - 0.010802/exp(cd(0,1)*(0. - 5.529348698328969*kx + 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. + 2.0238820899708214*kx + 7.55323078829979*ky));
tmp=cd(0.,0.) + 0.095916/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) - 0.131689/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky));
tmp=cd(-0.10825,0.) - 0.011519/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky));
... //6000 more lines omitted
}

-O2选项的编译也失败了。

此外,问题不仅限于mingw编译器。我也尝试过intel parallel studio icl.exe。情况更糟糕的是,编译需要30多分钟,似乎它会继续下去,我没有耐心等待它完成,可能最后也可能失败。

所以我的问题是导致-O2编译失败的原因是什么?如何使-O2适用于我的代码(具有大量表达式)?而且让我感到惊讶的是,虽然有很多表达式,但它们只是由基本的exp函数组成,为什么编译需要花费很多时间和内存?任何使编译更快的技巧?

更新

根据Marc Glisse的建议,我执行以下操作。 -O1有效,但我想要的至少是O2,因为该代码用于科学计算目的。速度很重要。

R:\>g++ -O1  -ftime-report  eigen.cpp

Execution times (seconds)
 phase setup             :   0.01 ( 0%) usr    1540 kB ( 0%) ggc
 phase parsing           :   6.06 ( 5%) usr  412774 kB (25%) ggc
 phase lang. deferred    :   0.18 ( 0%) usr    6491 kB ( 0%) ggc
 phase opt and generate  : 122.65 (95%) usr 1203926 kB (74%) ggc
 |name lookup            :   0.61 ( 0%) usr   39968 kB ( 2%) ggc
 |overload resolution    :   2.18 ( 2%) usr  151685 kB ( 9%) ggc
 garbage collection      :   1.48 ( 1%) usr       0 kB ( 0%) ggc
 callgraph construction  :   0.65 ( 1%) usr   28545 kB ( 2%) ggc
 callgraph optimization  :   0.41 ( 0%) usr       6 kB ( 0%) ggc
 ipa dead code removal   :   0.02 ( 0%) usr       0 kB ( 0%) ggc
 ipa inlining heuristics :   0.58 ( 0%) usr    6172 kB ( 0%) ggc
 ipa reference           :   0.02 ( 0%) usr       0 kB ( 0%) ggc
 ipa profile             :   0.11 ( 0%) usr       0 kB ( 0%) ggc
 ipa pure const          :   0.20 ( 0%) usr       0 kB ( 0%) ggc
 cfg cleanup             :   0.04 ( 0%) usr       0 kB ( 0%) ggc
 trivially dead code     :   0.05 ( 0%) usr       0 kB ( 0%) ggc
 df scan insns           :   0.09 ( 0%) usr       0 kB ( 0%) ggc
 df multiple defs        :   0.03 ( 0%) usr       0 kB ( 0%) ggc
 df live regs            :   0.13 ( 0%) usr       0 kB ( 0%) ggc
 df live&initialized regs:   0.04 ( 0%) usr       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.17 ( 0%) usr    2440 kB ( 0%) ggc
 register information    :   0.01 ( 0%) usr       0 kB ( 0%) ggc
 alias analysis          :   0.05 ( 0%) usr    1546 kB ( 0%) ggc
 alias stmt walking      :  27.43 (21%) usr   19006 kB ( 1%) ggc
 rebuild jump labels     :   0.03 ( 0%) usr       0 kB ( 0%) ggc
 preprocessing           :   0.63 ( 0%) usr    8732 kB ( 1%) ggc
 parser (global)         :   0.30 ( 0%) usr   80513 kB ( 5%) ggc
 parser struct body      :   0.36 ( 0%) usr   20184 kB ( 1%) ggc
 parser enumerator list  :   0.03 ( 0%) usr    1004 kB ( 0%) ggc
 parser function body    :   3.52 ( 3%) usr  253532 kB (16%) ggc
 parser inl. func. body  :   0.16 ( 0%) usr    6243 kB ( 0%) ggc
 parser inl. meth. body  :   0.24 ( 0%) usr   12261 kB ( 1%) ggc
 template instantiation  :   0.75 ( 1%) usr   36791 kB ( 2%) ggc
 early inlining heuristics:   0.74 ( 1%) usr   78738 kB ( 5%) ggc
 inline parameters       :   0.60 ( 0%) usr    3273 kB ( 0%) ggc
 integration             :  34.96 (27%) usr  421223 kB (26%) ggc
 tree gimplify           :   0.93 ( 1%) usr   78917 kB ( 5%) ggc
 tree eh                 :   1.81 ( 1%) usr  147729 kB ( 9%) ggc
 tree CFG construction   :   0.26 ( 0%) usr   47487 kB ( 3%) ggc
 tree CFG cleanup        :   0.92 ( 1%) usr       0 kB ( 0%) ggc
 tree copy propagation   :   0.03 ( 0%) usr       0 kB ( 0%) ggc
 tree PTA                :   1.80 ( 1%) usr     167 kB ( 0%) ggc
 tree PHI insertion      :   0.07 ( 0%) usr     519 kB ( 0%) ggc
 tree SSA rewrite        :   1.63 ( 1%) usr   97983 kB ( 6%) ggc
 tree SSA other          :   0.13 ( 0%) usr      17 kB ( 0%) ggc
 tree SSA incremental    :  28.75 (22%) usr       5 kB ( 0%) ggc
 tree operand scan       :   2.13 ( 2%) usr   65917 kB ( 4%) ggc
 dominator optimization  :   0.08 ( 0%) usr    2043 kB ( 0%) ggc
 tree SRA                :   2.65 ( 2%) usr   56210 kB ( 3%) ggc
 tree CCP                :   2.42 ( 2%) usr   37765 kB ( 2%) ggc
 tree split crit edges   :   0.11 ( 0%) usr    2953 kB ( 0%) ggc
 tree reassociation      :   0.04 ( 0%) usr       0 kB ( 0%) ggc
 tree FRE                :   3.35 ( 3%) usr   35524 kB ( 2%) ggc
 tree code sinking       :   0.01 ( 0%) usr       0 kB ( 0%) ggc
 tree linearize phis     :   0.01 ( 0%) usr       6 kB ( 0%) ggc
 tree backward propagate :   0.02 ( 0%) usr       0 kB ( 0%) ggc
 tree forward propagate  :   0.38 ( 0%) usr       8 kB ( 0%) ggc
 tree conservative DCE   :   0.13 ( 0%) usr       1 kB ( 0%) ggc
 tree aggressive DCE     :   0.33 ( 0%) usr       2 kB ( 0%) ggc
 tree DSE                :   0.45 ( 0%) usr       4 kB ( 0%) ggc
 tree SSA uncprop        :   0.01 ( 0%) usr       0 kB ( 0%) ggc
 dominance frontiers     :   0.06 ( 0%) usr       0 kB ( 0%) ggc
 dominance computation   :   0.65 ( 1%) usr       0 kB ( 0%) ggc
 out of ssa              :   0.09 ( 0%) usr       1 kB ( 0%) ggc
 expand vars             :   0.02 ( 0%) usr     765 kB ( 0%) ggc
 expand                  :   0.13 ( 0%) usr   13796 kB ( 1%) ggc
 post expand cleanups    :   0.03 ( 0%) usr    2868 kB ( 0%) ggc
 forward prop            :   0.08 ( 0%) usr     156 kB ( 0%) ggc
 CSE                     :   0.08 ( 0%) usr     304 kB ( 0%) ggc
 dead code elimination   :   0.03 ( 0%) usr       0 kB ( 0%) ggc
 dead store elim1        :   0.09 ( 0%) usr     763 kB ( 0%) ggc
 dead store elim2        :   0.08 ( 0%) usr     613 kB ( 0%) ggc
 loop init               :   0.15 ( 0%) usr      65 kB ( 0%) ggc
 branch prediction       :   0.12 ( 0%) usr      19 kB ( 0%) ggc
 combiner                :   0.10 ( 0%) usr     216 kB ( 0%) ggc
 if-conversion           :   0.01 ( 0%) usr       0 kB ( 0%) ggc
 integrated RA           :   0.43 ( 0%) usr    9659 kB ( 1%) ggc
 LRA non-specific        :   0.26 ( 0%) usr     305 kB ( 0%) ggc
 LRA virtuals elimination:   0.03 ( 0%) usr     304 kB ( 0%) ggc
 LRA create live ranges  :   0.03 ( 0%) usr     152 kB ( 0%) ggc
 LRA hard reg assignment :   0.02 ( 0%) usr       0 kB ( 0%) ggc
 reload CSE regs         :   0.19 ( 0%) usr     916 kB ( 0%) ggc
 thread pro- & epilogue  :   0.04 ( 0%) usr      14 kB ( 0%) ggc
 hard reg cprop          :   0.07 ( 0%) usr       0 kB ( 0%) ggc
 shorten branches        :   0.08 ( 0%) usr       0 kB ( 0%) ggc
 final                   :   0.16 ( 0%) usr     279 kB ( 0%) ggc
 initialize rtl          :   0.01 ( 0%) usr      12 kB ( 0%) ggc
 rest of compilation     :   0.31 ( 0%) usr     879 kB ( 0%) ggc
 remove unused locals    :   2.24 ( 2%) usr       0 kB ( 0%) ggc
 address taken           :   1.00 ( 1%) usr   37564 kB ( 2%) ggc
 rebuild frequencies     :   0.02 ( 0%) usr       0 kB ( 0%) ggc
 TOTAL                 : 128.90           1624743 kB

1 个答案:

答案 0 :(得分:0)

我在表达式中看到了一些冗余,如术语:

在h(0,2)和h(0,3)中看到

exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky))

-O2强制编译检测和重用模式。似乎复杂性太高,有6k行表达式。你可以帮助gcc使用tmp变量。这相当于构建一个依赖图,然后生成代码。