特征小尺寸张贴令人失望的表现

时间:2017-11-04 16:47:45

标签: c++ performance-testing eigen3

我对Eigen不支持的张量模块有疑问,请参阅1234。我喜欢将它用于小尺寸张量,例如具有3维的二阶和四阶张量。我使用google's benchmark module编写了一个性能测试。在我的观点中,Eigen的表现非常糟糕。

这是我使用的代码。

#include <unsupported/Eigen/CXX11/Tensor>
#include <benchmark/benchmark.h>
#include <array>

// eigen double contraction benchmark
void BM_eigen(benchmark::State& state)
{
    Eigen::TensorFixedSize<double, Eigen::Sizes<3,3>> A;
    A.setConstant(1.0);
    Eigen::TensorFixedSize<double, Eigen::Sizes<3,3,3,3>> B;
    B.setConstant(2.0);
    Eigen::TensorFixedSize<double, Eigen::Sizes<3,3>> C;
    C.setConstant(0.0);

    static const Eigen::array<Eigen::IndexPair<int>, 2> contraction_pair
        { Eigen::IndexPair<int>(2, 0), Eigen::IndexPair<int>(3, 1) };

    while (state.KeepRunning())
    {
        for (int i = 0; i < state.range(0); ++i)
        {
            C = B.contract(A, contraction_pair);
            benchmark::DoNotOptimize(A);
            benchmark::DoNotOptimize(B);
            benchmark::DoNotOptimize(C);
        }
    }
    state.SetItemsProcessed(state.iterations() * state.range(0) * sizeof(nullptr));
}

// raw loops for double contraction benchmark
void BM_loops(benchmark::State& state)
{
    std::array<double, 9> A;
    A.fill(1.0);
    std::array<double, 81> B;
    B.fill(2.0);
    std::array<double, 9> C;
    C.fill(0.0);

    while (state.KeepRunning())
    {
        for (int i = 0; i < state.range(0); ++i)
        {
            for(std::size_t i = 0; i < 3; i ++)
                for(std::size_t j = 0; j < 3; j++)
                {
                    C[i + j * 3] = 0.0;
                    for (std::size_t k = 0; k < 3; k++)
                        for (std::size_t l = 0; l < 3; l++)
                            // C[i * 3 + j] += B[i * 27 + 9 * j + 3 * k + l] * A[k * 3 + l];
                            C[i + j * 3] += B[i + 3 * j + 9 * k + 27 * l] * A[k + l * 3];
                }

            benchmark::DoNotOptimize(A);
            benchmark::DoNotOptimize(B);
            benchmark::DoNotOptimize(C);
        }
    }
    state.SetItemsProcessed(state.iterations() * state.range(0) * sizeof(nullptr));
}

BENCHMARK(BM_loops)->Arg(1);
BENCHMARK(BM_loops)->Arg(8);
BENCHMARK(BM_loops)->Arg(64);
BENCHMARK(BM_loops)->Arg(512);
BENCHMARK(BM_loops)->Arg(1024);

BENCHMARK(BM_eigen)->Arg(1);
BENCHMARK(BM_eigen)->Arg(8);
BENCHMARK(BM_eigen)->Arg(64);
BENCHMARK(BM_eigen)->Arg(512);
BENCHMARK(BM_eigen)->Arg(1024);

BENCHMARK_MAIN()

我用

编译了它
g++ -DNDEBUG -O3 -std=c++14 -I /usr/local/include/eigen3/ benchmark.cpp -lbenchmark -lpthread -o benchmark

如果我运行代码,我会得到以下输出

$ ./benchmark 
Run on (4 X 3300 MHz CPU s)
2017-11-04 17:38:52
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------
Benchmark              Time           CPU Iterations
-----------------------------------------------------
BM_loops/1            47 ns         47 ns   15196904   161.447M items/s
BM_loops/8           376 ns        376 ns    1961059    162.12M items/s
BM_loops/64         3666 ns       3664 ns     251142   133.257M items/s
BM_loops/512       27198 ns      27196 ns      25595   143.635M items/s
BM_loops/1024      54874 ns      54834 ns      12663   142.475M items/s
BM_eigen/1           490 ns        490 ns    1218410   15.5745M items/s
BM_eigen/8          4697 ns       4693 ns     173951   13.0047M items/s
BM_eigen/64        32712 ns      32711 ns      22122   14.9273M items/s
BM_eigen/512      241772 ns     241599 ns       2822   16.1684M items/s
BM_eigen/1024     501559 ns     501541 ns       1455    15.577M items/s

这意味着使用Eigen会使我的代码减慢大约10倍。我错了,还是Eigen tensor对于小尺寸来说真的那么糟糕?

致以最诚挚的问候,

纳斯

0 个答案:

没有答案