我对Eigen不支持的张量模块有疑问,请参阅1,2,3和4。我喜欢将它用于小尺寸张量,例如具有3维的二阶和四阶张量。我使用google's benchmark module编写了一个性能测试。在我的观点中,Eigen的表现非常糟糕。
这是我使用的代码。
#include <unsupported/Eigen/CXX11/Tensor>
#include <benchmark/benchmark.h>
#include <array>
// eigen double contraction benchmark
void BM_eigen(benchmark::State& state)
{
Eigen::TensorFixedSize<double, Eigen::Sizes<3,3>> A;
A.setConstant(1.0);
Eigen::TensorFixedSize<double, Eigen::Sizes<3,3,3,3>> B;
B.setConstant(2.0);
Eigen::TensorFixedSize<double, Eigen::Sizes<3,3>> C;
C.setConstant(0.0);
static const Eigen::array<Eigen::IndexPair<int>, 2> contraction_pair
{ Eigen::IndexPair<int>(2, 0), Eigen::IndexPair<int>(3, 1) };
while (state.KeepRunning())
{
for (int i = 0; i < state.range(0); ++i)
{
C = B.contract(A, contraction_pair);
benchmark::DoNotOptimize(A);
benchmark::DoNotOptimize(B);
benchmark::DoNotOptimize(C);
}
}
state.SetItemsProcessed(state.iterations() * state.range(0) * sizeof(nullptr));
}
// raw loops for double contraction benchmark
void BM_loops(benchmark::State& state)
{
std::array<double, 9> A;
A.fill(1.0);
std::array<double, 81> B;
B.fill(2.0);
std::array<double, 9> C;
C.fill(0.0);
while (state.KeepRunning())
{
for (int i = 0; i < state.range(0); ++i)
{
for(std::size_t i = 0; i < 3; i ++)
for(std::size_t j = 0; j < 3; j++)
{
C[i + j * 3] = 0.0;
for (std::size_t k = 0; k < 3; k++)
for (std::size_t l = 0; l < 3; l++)
// C[i * 3 + j] += B[i * 27 + 9 * j + 3 * k + l] * A[k * 3 + l];
C[i + j * 3] += B[i + 3 * j + 9 * k + 27 * l] * A[k + l * 3];
}
benchmark::DoNotOptimize(A);
benchmark::DoNotOptimize(B);
benchmark::DoNotOptimize(C);
}
}
state.SetItemsProcessed(state.iterations() * state.range(0) * sizeof(nullptr));
}
BENCHMARK(BM_loops)->Arg(1);
BENCHMARK(BM_loops)->Arg(8);
BENCHMARK(BM_loops)->Arg(64);
BENCHMARK(BM_loops)->Arg(512);
BENCHMARK(BM_loops)->Arg(1024);
BENCHMARK(BM_eigen)->Arg(1);
BENCHMARK(BM_eigen)->Arg(8);
BENCHMARK(BM_eigen)->Arg(64);
BENCHMARK(BM_eigen)->Arg(512);
BENCHMARK(BM_eigen)->Arg(1024);
BENCHMARK_MAIN()
我用
编译了它g++ -DNDEBUG -O3 -std=c++14 -I /usr/local/include/eigen3/ benchmark.cpp -lbenchmark -lpthread -o benchmark
如果我运行代码,我会得到以下输出
$ ./benchmark
Run on (4 X 3300 MHz CPU s)
2017-11-04 17:38:52
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
BM_loops/1 47 ns 47 ns 15196904 161.447M items/s
BM_loops/8 376 ns 376 ns 1961059 162.12M items/s
BM_loops/64 3666 ns 3664 ns 251142 133.257M items/s
BM_loops/512 27198 ns 27196 ns 25595 143.635M items/s
BM_loops/1024 54874 ns 54834 ns 12663 142.475M items/s
BM_eigen/1 490 ns 490 ns 1218410 15.5745M items/s
BM_eigen/8 4697 ns 4693 ns 173951 13.0047M items/s
BM_eigen/64 32712 ns 32711 ns 22122 14.9273M items/s
BM_eigen/512 241772 ns 241599 ns 2822 16.1684M items/s
BM_eigen/1024 501559 ns 501541 ns 1455 15.577M items/s
这意味着使用Eigen会使我的代码减慢大约10倍。我错了,还是Eigen tensor对于小尺寸来说真的那么糟糕?
致以最诚挚的问候,
纳斯