Question

我想知道为什么我会用这两对明显的递归例子得到意想不到的表现。

相同的递归函数在结构内部更快（rec2 VS rec1），并且使用虚拟参数（rec4 VS rec3），相同的递归模板函数更快！

使用更多参数，C ++的功能是否更快？！

以下是尝试过的代码：

#include <QDebug>
#include <QElapsedTimer>


constexpr std::size_t N = 28;
std::size_t counter = 0;


// non template function which take 1 argument
void rec1(std::size_t depth)
{
    ++counter;
    if ( depth < N )
    {
        rec1(depth + 1);
        rec1(depth + 1);
    }
}

// non template member which take 2 arguments (implicit this)
struct A
{
    void rec2(std::size_t depth)
    {
        ++counter;
        if ( depth < N )
        {
            rec2(depth + 1);
            rec2(depth + 1);
        }
    }
};

// template function which take 0 argument
template <std::size_t D>
void rec3()
{
    ++counter;
    rec3<D - 1>();
    rec3<D - 1>();
}

template <>
void rec3<0>()
{
    ++counter;
}

// template function which take 1 (dummy) argument
struct Foo
{
    int x;
};

template <std::size_t D>
void rec4(Foo x)
{
    ++counter;
    rec4<D - 1>(x);
    rec4<D - 1>(x);
}

template <>
void rec4<0>(Foo x)
{
    ++counter;
}


int main()
{
    QElapsedTimer t;
    t.start();
    rec1(0);
    qDebug() << "time1" << t.elapsed();
    qDebug() << "counter" << counter;
    counter = 0;
    A a;
    t.start();
    a.rec2(0);
    qDebug()<< "time2"  << t.elapsed();
    qDebug()<< "counter"  << counter;
    counter = 0;
    t.start();
    rec3<N>();
    qDebug()<< "time3"  << t.elapsed();
    qDebug()<< "counter"  << counter;
    counter = 0;
    t.start();
    rec4<N>(Foo());
    qDebug()<< "time4"  << t.elapsed();
    qDebug()<< "counter"  << counter;

    qDebug() << "fin";

    return 0;
}

我得到了这个输出：

time1 976 
counter 536870911 
time2 341 
counter 536870911 
time3 241 
counter 536870911 
time4 201 
counter 536870911 
fin

我有：Windows 8.1 / i7 3630QM /最新启用Qt chaintool / c ++ 14

Answer 1

我终于能够在Visual Studio 2015社区中看到这个。检查已编译代码的反汇编，rec1和rec2是递归的。它们在生成的代码中非常相似，虽然rec2有更多的指令，但运行速度稍快。 rec3和rec4都为模板参数中的所有不同D值生成一系列函数，在这种情况下，编译器已经消除了许多函数调用，消除了其他函数，并添加了一个更大的值来计算。（例如，rec4＆lt; 10＆gt;只增加2047来计算并返回。）

因此，您看到的性能差异主要是由于编译器如何优化每个版本，而代码流经CPU的方式也略有不同。

我的结果（用秒测量的时间），用/ Ox / O2编译：

time1 1.03411
counter 536870911
time2 0.970455
counter 536870911
time3 0.000866
counter 536870911
time4 0.000804
counter 536870911

递归会导致意外的性能结果

1 个答案: