我使用Eigen库进行一些矩阵计算。我必须定义一个大矩阵(实际上不是那么大,只有300x300),每个元素由长复指数表达式组成。
为了给我的意思留下印象,我复制了矩阵定义的一小部分
#include <iostream>
#include <complex>
#include <Eigen/Dense>
using namespace Eigen;
int main()
{
typedef std::complex<double> cd;
MatrixXcd h(300,300);
double kx,ky;
kx=1.;
ky=1.;
h.setZero(300,300);
h(0,0)=cd(6.942755,0.) + 0.043986/exp(cd(0,1)*(0. - 2.0238820899708214*kx - 7.55323078829979*ky)) - 0.010802/exp(cd(0,1)*(0. + 5.529348698328969*kx - 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) + 0.043986/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky)) - 0.010802/exp(cd(0,1)*(0. - 5.529348698328969*kx + 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. + 2.0238820899708214*kx + 7.55323078829979*ky));
h(0,2)=cd(0.,0.) + 0.095916/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) - 0.131689/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky));
h(0,3)=cd(-0.10825,0.) - 0.011519/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky));
...
...//6000 more lines omitted here
}
我在windows上使用mingw-w64,编译器设置正常。但是当我用
编译上面的代码时g++ -O2 code.cpp
编译失败,弹出对话框!
如果我仔细查看任务管理器,编译会在内存使用量停止时大约1GB。
但是,如果我再次使用-O0
选项编译代码,即禁用所有优化,编译成功,即使内存使用量达到接近2GB的峰值。 所以失败的确定不是由于记忆。
更重要的是,我可以确认此行为与Eigen
库无关。即使我不使用Eigen
并将所有分配替换为同一个变量,像这样
#include <iostream>
#include <complex>
int main()
{
typedef std::complex<double> cd;
cd tmp;
double kx,ky;
kx=1.;
ky=1.;
tmp=cd(6.942755,0.) + 0.043986/exp(cd(0,1)*(0. - 2.0238820899708214*kx - 7.55323078829979*ky)) - 0.010802/exp(cd(0,1)*(0. + 5.529348698328969*kx - 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) + 0.043986/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky)) - 0.010802/exp(cd(0,1)*(0. - 5.529348698328969*kx + 5.529348698328969*ky)) + 0.043986/exp(cd(0,1)*(0. + 2.0238820899708214*kx + 7.55323078829979*ky));
tmp=cd(0.,0.) + 0.095916/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky)) - 0.131689/exp(cd(0,1)*(0. + 7.55323078829979*kx + 2.0238820899708214*ky));
tmp=cd(-0.10825,0.) - 0.011519/exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky));
... //6000 more lines omitted
}
-O2
选项的编译也失败了。
此外,问题不仅限于mingw编译器。我也尝试过intel parallel studio icl.exe。情况更糟糕的是,编译需要30多分钟,似乎它会继续下去,我没有耐心等待它完成,可能最后也可能失败。
所以我的问题是导致-O2
编译失败的原因是什么?如何使-O2
适用于我的代码(具有大量表达式)?而且让我感到惊讶的是,虽然有很多表达式,但它们只是由基本的exp
函数组成,为什么编译需要花费很多时间和内存?任何使编译更快的技巧?
更新
根据Marc Glisse的建议,我执行以下操作。 -O1
有效,但我想要的至少是O2,因为该代码用于科学计算目的。速度很重要。
R:\>g++ -O1 -ftime-report eigen.cpp
Execution times (seconds)
phase setup : 0.01 ( 0%) usr 1540 kB ( 0%) ggc
phase parsing : 6.06 ( 5%) usr 412774 kB (25%) ggc
phase lang. deferred : 0.18 ( 0%) usr 6491 kB ( 0%) ggc
phase opt and generate : 122.65 (95%) usr 1203926 kB (74%) ggc
|name lookup : 0.61 ( 0%) usr 39968 kB ( 2%) ggc
|overload resolution : 2.18 ( 2%) usr 151685 kB ( 9%) ggc
garbage collection : 1.48 ( 1%) usr 0 kB ( 0%) ggc
callgraph construction : 0.65 ( 1%) usr 28545 kB ( 2%) ggc
callgraph optimization : 0.41 ( 0%) usr 6 kB ( 0%) ggc
ipa dead code removal : 0.02 ( 0%) usr 0 kB ( 0%) ggc
ipa inlining heuristics : 0.58 ( 0%) usr 6172 kB ( 0%) ggc
ipa reference : 0.02 ( 0%) usr 0 kB ( 0%) ggc
ipa profile : 0.11 ( 0%) usr 0 kB ( 0%) ggc
ipa pure const : 0.20 ( 0%) usr 0 kB ( 0%) ggc
cfg cleanup : 0.04 ( 0%) usr 0 kB ( 0%) ggc
trivially dead code : 0.05 ( 0%) usr 0 kB ( 0%) ggc
df scan insns : 0.09 ( 0%) usr 0 kB ( 0%) ggc
df multiple defs : 0.03 ( 0%) usr 0 kB ( 0%) ggc
df live regs : 0.13 ( 0%) usr 0 kB ( 0%) ggc
df live&initialized regs: 0.04 ( 0%) usr 0 kB ( 0%) ggc
df reg dead/unused notes: 0.17 ( 0%) usr 2440 kB ( 0%) ggc
register information : 0.01 ( 0%) usr 0 kB ( 0%) ggc
alias analysis : 0.05 ( 0%) usr 1546 kB ( 0%) ggc
alias stmt walking : 27.43 (21%) usr 19006 kB ( 1%) ggc
rebuild jump labels : 0.03 ( 0%) usr 0 kB ( 0%) ggc
preprocessing : 0.63 ( 0%) usr 8732 kB ( 1%) ggc
parser (global) : 0.30 ( 0%) usr 80513 kB ( 5%) ggc
parser struct body : 0.36 ( 0%) usr 20184 kB ( 1%) ggc
parser enumerator list : 0.03 ( 0%) usr 1004 kB ( 0%) ggc
parser function body : 3.52 ( 3%) usr 253532 kB (16%) ggc
parser inl. func. body : 0.16 ( 0%) usr 6243 kB ( 0%) ggc
parser inl. meth. body : 0.24 ( 0%) usr 12261 kB ( 1%) ggc
template instantiation : 0.75 ( 1%) usr 36791 kB ( 2%) ggc
early inlining heuristics: 0.74 ( 1%) usr 78738 kB ( 5%) ggc
inline parameters : 0.60 ( 0%) usr 3273 kB ( 0%) ggc
integration : 34.96 (27%) usr 421223 kB (26%) ggc
tree gimplify : 0.93 ( 1%) usr 78917 kB ( 5%) ggc
tree eh : 1.81 ( 1%) usr 147729 kB ( 9%) ggc
tree CFG construction : 0.26 ( 0%) usr 47487 kB ( 3%) ggc
tree CFG cleanup : 0.92 ( 1%) usr 0 kB ( 0%) ggc
tree copy propagation : 0.03 ( 0%) usr 0 kB ( 0%) ggc
tree PTA : 1.80 ( 1%) usr 167 kB ( 0%) ggc
tree PHI insertion : 0.07 ( 0%) usr 519 kB ( 0%) ggc
tree SSA rewrite : 1.63 ( 1%) usr 97983 kB ( 6%) ggc
tree SSA other : 0.13 ( 0%) usr 17 kB ( 0%) ggc
tree SSA incremental : 28.75 (22%) usr 5 kB ( 0%) ggc
tree operand scan : 2.13 ( 2%) usr 65917 kB ( 4%) ggc
dominator optimization : 0.08 ( 0%) usr 2043 kB ( 0%) ggc
tree SRA : 2.65 ( 2%) usr 56210 kB ( 3%) ggc
tree CCP : 2.42 ( 2%) usr 37765 kB ( 2%) ggc
tree split crit edges : 0.11 ( 0%) usr 2953 kB ( 0%) ggc
tree reassociation : 0.04 ( 0%) usr 0 kB ( 0%) ggc
tree FRE : 3.35 ( 3%) usr 35524 kB ( 2%) ggc
tree code sinking : 0.01 ( 0%) usr 0 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 6 kB ( 0%) ggc
tree backward propagate : 0.02 ( 0%) usr 0 kB ( 0%) ggc
tree forward propagate : 0.38 ( 0%) usr 8 kB ( 0%) ggc
tree conservative DCE : 0.13 ( 0%) usr 1 kB ( 0%) ggc
tree aggressive DCE : 0.33 ( 0%) usr 2 kB ( 0%) ggc
tree DSE : 0.45 ( 0%) usr 4 kB ( 0%) ggc
tree SSA uncprop : 0.01 ( 0%) usr 0 kB ( 0%) ggc
dominance frontiers : 0.06 ( 0%) usr 0 kB ( 0%) ggc
dominance computation : 0.65 ( 1%) usr 0 kB ( 0%) ggc
out of ssa : 0.09 ( 0%) usr 1 kB ( 0%) ggc
expand vars : 0.02 ( 0%) usr 765 kB ( 0%) ggc
expand : 0.13 ( 0%) usr 13796 kB ( 1%) ggc
post expand cleanups : 0.03 ( 0%) usr 2868 kB ( 0%) ggc
forward prop : 0.08 ( 0%) usr 156 kB ( 0%) ggc
CSE : 0.08 ( 0%) usr 304 kB ( 0%) ggc
dead code elimination : 0.03 ( 0%) usr 0 kB ( 0%) ggc
dead store elim1 : 0.09 ( 0%) usr 763 kB ( 0%) ggc
dead store elim2 : 0.08 ( 0%) usr 613 kB ( 0%) ggc
loop init : 0.15 ( 0%) usr 65 kB ( 0%) ggc
branch prediction : 0.12 ( 0%) usr 19 kB ( 0%) ggc
combiner : 0.10 ( 0%) usr 216 kB ( 0%) ggc
if-conversion : 0.01 ( 0%) usr 0 kB ( 0%) ggc
integrated RA : 0.43 ( 0%) usr 9659 kB ( 1%) ggc
LRA non-specific : 0.26 ( 0%) usr 305 kB ( 0%) ggc
LRA virtuals elimination: 0.03 ( 0%) usr 304 kB ( 0%) ggc
LRA create live ranges : 0.03 ( 0%) usr 152 kB ( 0%) ggc
LRA hard reg assignment : 0.02 ( 0%) usr 0 kB ( 0%) ggc
reload CSE regs : 0.19 ( 0%) usr 916 kB ( 0%) ggc
thread pro- & epilogue : 0.04 ( 0%) usr 14 kB ( 0%) ggc
hard reg cprop : 0.07 ( 0%) usr 0 kB ( 0%) ggc
shorten branches : 0.08 ( 0%) usr 0 kB ( 0%) ggc
final : 0.16 ( 0%) usr 279 kB ( 0%) ggc
initialize rtl : 0.01 ( 0%) usr 12 kB ( 0%) ggc
rest of compilation : 0.31 ( 0%) usr 879 kB ( 0%) ggc
remove unused locals : 2.24 ( 2%) usr 0 kB ( 0%) ggc
address taken : 1.00 ( 1%) usr 37564 kB ( 2%) ggc
rebuild frequencies : 0.02 ( 0%) usr 0 kB ( 0%) ggc
TOTAL : 128.90 1624743 kB
答案 0 :(得分:0)
我在表达式中看到了一些冗余,如术语:
在h(0,2)和h(0,3)中看到 exp(cd(0,1)*(0. - 7.55323078829979*kx - 2.0238820899708214*ky))
。
-O2
强制编译检测和重用模式。似乎复杂性太高,有6k行表达式。你可以帮助gcc使用tmp变量。这相当于构建一个依赖图,然后生成代码。