Question

所以我一直在测试C中排序算法的运行时间，我一直在对代码进行微调，看看它会如何影响速度等等，其中一个修改是在排序算法中进行冒泡排序交换而不是一个单独的函数调用，我希望它更快，因为函数调用打开了自己的堆栈帧，但结果差不多两倍，我不明白为什么。

以下是代码：

void Swap(int& x, int& y)
{
    int temp = x;
    x = y;
    y = temp;
}
void BubbleSort(int data[], int size)
{
    int i, j, temp;
    bool is_sorted = false;
    for (i = 0; i < (size - 1) && !is_sorted; i++)
    {
        is_sorted = true;
        for (j = size - 1; j > i; j--)
            if (data[j] < data[j - 1])
            {
                //used to be swap(data[j],data[j-1];
                temp = data[j];
                data[j] = data[j - 1];
                data[j-1] = temp;
                is_sorted = false;
            }
    }
}

编辑以回答评论，是的，我确实在发布时运行了编译器优化，如果你想看看我在这里得到的运行时间是完整的代码https://gist.github.com/anonymous/7363330

Answer 1

我的猜测是编译器在函数中优化了temp变量，并且可以识别交换。但是如果没有该函数，temp变量的范围扩展到它所使用的块之外，因此如果没有足够的优化级别，编译器可能总是将最后的“临时”值存储在其中。

尝试将temp的声明从循环外部移动到您使用它的位置，即int temp = data[j]。

无论如何，这只是猜测;看一下生产的组件来验证。

Answer 2

由于函数调用打开，我预计它会更快有自己的堆栈框架

期望Swap内联是完全合理的。在这种情况下，编译器与您手动完成的操作基本相同，两个版本之间没有区别。

事实上，我已经检查了你在这里发布的代码，包括clang 3.4（trunk）和gcc 4.7.2，优化级别-O3和这两个版本之间没有任何区别您的交换（Swap函数与手动内联交换）。

这是我的代码：

#include <algorithm>
#include <cstdio>
#include <numeric>
#include <vector>
#include <boost/chrono.hpp>

void Swap(int& x, int& y)
{
    int temp = x;
    x = y;
    y = temp;
}

void BubbleSort(int data[], int size)
{
    int i, j, temp;
    bool is_sorted = false;
    for (i = 0; i < (size - 1) && !is_sorted; i++)
    {
        is_sorted = true;
        for (j = size - 1; j > i; j--)

            if (data[j] < data[j - 1])
            {
                Swap(data[j],data[j-1]);
                //temp = data[j];
                //data[j] = data[j - 1];
                //data[j-1] = temp;
                is_sorted = false;
            }
    }
}

int main() {

    const int SIZE = 30000;

    std::vector<int> v(SIZE);

    std::iota(v.begin(), v.end(), 0);

    std::shuffle(v.begin(), v.end(), std::mt19937(5489u));

    using namespace boost::chrono;

    auto start = high_resolution_clock::now();

    BubbleSort(v.data(), v.size());

    auto finish = high_resolution_clock::now();

    std::printf("%ld  ms\n", duration_cast<milliseconds>(finish-start).count());
}

我使用(clan)g++ -O3 -std=c++11 sort.cpp -lboost_system -lboost_chrono -lrt编译。

所以，问题必定在其他地方。

使用swap作为函数而不是在算法本身内编码更快

2 个答案: