Question

我的代码将以递归方式遍历二叉树。这样做我有一些我需要控制的参数。因此，我的函数看起来像这样：

FindPoints(int leftchild, int rightchild, int ly_index, int uy_index, int bit, int nodepos, int amount, int level);

它被称为很多次。由于参数的数量，我的程序的性能会受到影响吗？

Answer 1

递归期间的过程是：

为堆栈上的参数分配空间。通常从堆栈指针寄存器中减去一个值。
将变量值复制到堆栈中。取决于对象或值。
通话功能。这可能会导致处理器的刷新指令缓存。
在函数结束时，通过添加a来恢复堆栈指针值。
从函数调用返回;可能导致刷新指令缓存。

一般关注的不是性能，而是递归深度和堆栈大小。超出堆栈限制的递归称为 Stack Overflow 缺陷。

迭代解决方案可能更快，因为编译器可能能够优化循环。优化递归调用对于编译器来说更难以优化。

顺便说一句，在现代处理器上，递归调用的最坏情况时序小于1毫秒，通常在纳秒分辨率附近。所以你试图从程序中挤出纳秒。投资回报率（ROI）不是很好。

Answer 2

表现取决于很多因素;理想情况下，你会尝试一种方式，尝试另一种方式并进行比较。但是，以下是一些可以帮助您了解正在发生的事情的一般性考虑因素：

如果你的功能做了很多工作，那么浪费在函数调用上的时间就不会很大。
如果您的函数主要传递参数不变，那么您一定要考虑重构代码。如果它使用具有不同值的所有（或几乎所有）参数调用自身，则无法改进代码 - 它太复杂了。
函数调用性能取决于调用约定。编译器通常在寄存器中传递前几个参数（非常快），其余的在堆栈上传递（较慢）。您可能希望将参数数量设置得很小（fastcall为2; ARM约为4个 - 我只知道两个例子），这样它们都适合寄存器。

要扩展第二点 - 如果你的功能没有改变它的大部分参数，每次调用都会在堆栈周围复制这些参数 - 这对于计算机来说绝对是无用的工作。除了浪费时间之外，它还浪费数据缓存中的空间（导致更慢的速度，这尤其令人讨厌，因为它甚至不能归因于任何代码）并且可能导致堆栈溢出（或者可能不会，取决于您的操作系统）

在这种情况下改进代码的一种方法是：使用保存所有未更改参数的struct，并将指针/引用传递给它：

struct DataForFindPoints
{
    int ly_index;
    int uy_index;
    int bit;
    int nodepos;
    int amount;
    int level;
};

FindPoints(int leftchild, int rightchild, const DataForFindPoints& data);

或（面向对象的方式）：将class作为成员函数，将所有未更改的参数作为字段。

Answer 3

简短回答

在Windows下，在面向x64平台的发布模式下使用Visual Studio 2010进行编译，传递未包装的参数比通过引用甚至按值传递单个结构要慢得多。

结果如下：

Multi result = 0; multi iterations = 10000
Ref result = 0; ref iterations = 10000
Value result = 0; value iterations = 10000

---------------------------------------------------
Timer "multi args":
Total time = 0.387886

------------------------------------------
Timer "struct by reference":
Total time = 0.0679177

------------------------------------------
Timer "struct by value":
Total time = 0.143382

观察

您的函数在其体内进行的计算越多，复制开销就越不会影响性能。实际上，我已经对一个只执行一些加法和一个除法的函数进行了基准测试。

现在有些细节

我已经定义了一个包含所有参数的结构

struct Args{
    int leftchild;
    int rightchild;
    int ly_index;
    int uy_index;
    int bit;
    int nodepos;
    int amount;
    int level;

    Args(int l, int r, int ly, int uy, int b, int n, int a, int lev)
        : leftchild(l)
        , rightchild(r)
        , ly_index(ly)
        , uy_index(uy)
        , bit(b)
        , nodepos(n)
        , amount(a)
        , level(lev)
    {}
};

和3个功能。

static size_t counter1 = 0;
static size_t counter2 = 0;
static size_t counter3 = 0;

int FindPoints(int leftchild, int rightchild, int ly_index, int uy_index, int bit, int nodepos, int amount, int level)
{
    ++counter1;
    leftchild = leftchild + (rightchild + ly_index + uy_index + bit + nodepos + amount + level) / 100 - 1;
    return leftchild ? FindPoints( leftchild, rightchild, ly_index, uy_index, bit, nodepos, amount, level) : 0;
}

int FindPointsRef( Args& a )
{
    ++counter2;
    a.leftchild = a.leftchild + (a.rightchild + a.ly_index + a.uy_index + a.bit + a.nodepos + a.amount + a.level) / 100 - 1;
    return a.leftchild ? FindPointsRef( a ) : 0;
}

int FindPointsValue( Args a )
{
    ++counter3;
    a.leftchild = a.leftchild + (a.rightchild + a.ly_index + a.uy_index + a.bit + a.nodepos + a.amount + a.level) / 100 - 1;
    return a.leftchild ? FindPointsValue( a ) : 0;
}

他们都做同样的工作，但第一个采用参数，如你的问题，第二个采用参数结构参考，第三个采用按价值构建。

我使用Visual Studio 2010构建了程序，发布了x64配置，并且我使用自制类来测量，该类只包装Windows函数QueryPerformanceCounter并提供了一个方便的输出操作符。

主要功能如下：

int main()
{
    // define my timers
    PersistentTimer timer_multi("multi args");
    PersistentTimer timer_ref("struct by reference");
    PersistentTimer timer_value("struct by value");

    int leftchild = 10000;  // number of iterations; 10000 to prevent stack overflow
    int rightchild = 1;     // sum of other values is < 100 (look to FindPoints* implementations)
    int ly_index = 2;
    int uy_index = 3;
    int bit = 4;
    int nodepos = 5;
    int amount = 6;
    int level = 7;

    // define structs of arguments for second and third function
    Args args_ref( leftchild, rightchild, ly_index, uy_index, bit, nodepos, amount, level );
    Args args_copy( leftchild, rightchild, ly_index, uy_index, bit, nodepos, amount, level );

    // return values initialized to a non zero value just to be sure that functions have done thir job
    int a1 = 5;
    timer_multi.measure([&]{
        a1 = FindPoints( leftchild, rightchild, ly_index, uy_index, bit, nodepos, amount, level );
    });
    std::cout << "Multi result = " << a1 << "; multi iterations = " << counter1 << '\n';

    int a2 = 5;
    timer_ref.measure([&]{
        a2 = FindPointsRef( args_ref );
    });
    std::cout << "Ref result = " << a2 << "; ref iterations = " << counter2  << '\n';

    int a3 = 5;
    timer_value.measure([&]{
        a3 = FindPointsValue( args_copy );
    });
    std::cout << "Value result = " << a3 << "; value iterations = " << counter3  << '\n';

    // print timer results
    std::cout << timer_multi << timer_ref << timer_value;

    getchar();

}

Answer 4

对性能没有显着影响。这并不重要。如果你需要极端的性能，你应该在迭代中

但是，这是一个肮脏的代码。您应该尝试在struct或class中封装参数。它更安全，更容易维护

递归函数中的许多参数会导致性能问题吗？

4 个答案: