Question

我已经实现了两种算法，用于从最高到最低排序元素。

第一个，在实际RAM模型上采用二次时间，第二个采用O（n log（n））时间。第二个使用优先级队列来减少。

以下是时间，它是上述程序的输出。

第一列是随机数组整数的大小
第二列是O（n ^ 2）技术的时间（秒）

第三列是O（n log（n））技术的时间（秒）

 9600  1.92663      7.58865
 9800  1.93705      7.67376
10000  2.08647      8.19094

尽管复杂性存在很大差异，但对于所考虑的阵列大小，第3列大于第2列。为什么会这样？ C ++的优先级队列实现是否缓慢？

我在Windows 7，32位Visual Studio 2012上执行了此代码。

这是代码，

#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <queue>
#include <Windows.h>
#include <assert.h>

using namespace std;

double time_slower_sort(vector<int>& a) 
{
    LARGE_INTEGER frequency, start,end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE  ) exit(0);
    if (::QueryPerformanceCounter(&start)       == FALSE  ) exit(0);

    for(size_t i=0 ; i < a.size() ; ++i)  
    {

        vector<int>::iterator it = max_element( a.begin() + i ,a.end() ) ;
        int max_value = *it; 
        *it = a[i];
        a[i] = max_value;    

    }
    if (::QueryPerformanceCounter(&end) == FALSE) exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;
}



double time_faster_sort(vector<int>& a) 
{
    LARGE_INTEGER frequency, start,end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE  ) exit(0);
    if (::QueryPerformanceCounter(&start)       == FALSE  ) exit(0);


    // Push into the priority queue. Logarithmic cost per insertion = > O (n log(n)) total insertion cost
    priority_queue<int> pq;
    for(size_t i=0 ; i<a.size() ; ++i)
    {
        pq.push(a[i]);
    }

    // Read of the elements from the priority queue in order of priority
    // logarithmic reading cost per read => O(n log(n)) reading cost for entire vector
    for(size_t i=0 ; i<a.size() ; ++i)
    {
        a[i] = pq.top();
        pq.pop();
    }
    if (::QueryPerformanceCounter(&end) == FALSE) exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;

}




int main(int argc, char** argv)
{
    // Iterate over vectors of different sizes and try out the two different variants
    for(size_t N=1000; N<=10000 ; N += 100 ) 
    {

        // initialize two vectors with identical random elements
        vector<int> a(N),b(N);

        // initialize with random elements
        for(size_t i=0 ; i<N ; ++i) 
        {
            a[i] = rand() % 1000; 
            b[i] = a[i];
        }

        // Sort the two different variants and time them  
        cout << N << "  " 
             << time_slower_sort(a) << "\t\t" 
             << time_faster_sort(b) << endl;

        // Sanity check
        for(size_t i=0 ; i<=N-2 ; ++i) 
        {
            assert(a[i] == b[i]); // both should return the same answer
            assert(a[i] >= a[i+1]); // else not sorted
        }

    }
    return 0;
}

Answer 1

对于所考虑的数组大小，第3列大于第2列。

＆＃34; Big O＆＃34;符号只告诉你输入大小增长的时间。

您的时间是（或应该）

A + B*N^2          for the quadratic case,
C + D*N*LOG(N)     for the linearithmic case.

但是C完全有可能比A大得多，导致线性编码的执行时间更长，当N足够小时。

使线性算法变得有趣的是，如果您的输入从9600增加到19200（加倍），则对于二次算法，您的执行时间应该大约四倍，大约需要8秒，而线性算法应该只是执行时间的两倍。

因此执行时间比率将从2：8变为8:16，即二次算法现在只有两倍的速度。

再次输入大小的输入，8：16变为32:32;当面对大约40,000的输入时，这两种算法同样快。

当处理输入大小为80,000时，比率相反：四次32为128，而32次只有64. 128：64意味着线性算法现在是另一种的两倍。

你应该运行大小不同的测试，可能是N，2 * N和4 * N，以便更好地估计你的A，B，C和D常数。

这一切归结为，不要盲目依赖Big O分类。如果您希望您的输入增长，请使用它;但对于小输入，很可能是一个不太可扩展的算法效率更高。

例如 here 您会看到，对于较小的输入大小，较快的算法是以指数时间运行的算法，比对数的算法快数百倍。但是一旦输入大小超过9，指数算法的运行时间就会飙升，而另一方则没有。

您甚至可能决定实施算法的两个版本，并根据输入大小使用其中一个。有一些递归算法可以做到这一点，并切换到最后一次迭代的迭代实现。在图示的情况下，可以为每个尺寸范围实施最佳算法;但最好的折衷方案是仅采用两种算法，二次采用N = 15，然后切换到对数。

我发现here是对 Introsort 的引用，

是一种排序算法，最初使用Quicksort，但切换到当递归深度超过基于的水平时，Heapsort 正在排序的元素数的对数，并使用Insertion 由于其良好的参考地点，即为小案件排序，即当数据最有可能驻留在内存中并且很容易被引用时。

在上面的例子中，Insertion排序利用了内存局部性，这意味着它的B常量非常小;递归算法可能会产生更高的成本，并且具有显着的C值。因此，对于小型数据集，即使Big O分类较差，更紧凑的算法也能很好地运行。

Answer 2

你的O（N ^ 2）算法运行速度比O（N log N）算法快4倍。或者至少你认为你这样做。

显而易见的事情是验证你的假设。从尺寸9600,9800和10000可以得出结论并不多。尝试尺寸1000,2000,4000,8000,16000,32000。第一种算法是否每次都将时间增加4倍？第二种算法是否每次都会将时间增加一个略大于2的因子？

如果是，则O（N ^ 2）和O（N log N）看起来正确但第二个具有大量常数因子。如果不是，那么您对执行速度的假设是错误的，并且您开始调查原因。在N = 10,000时O（N log N）比O（N * N）长4倍将是非常不寻常的并且看起来非常可疑。

Answer 3

对于非优化/调试级std::代码，Visual Studio必须具有极大的开销，特别是优先级队列类。查看@msandifords评论。

我用g ++测试了你的程序，首先没有优化。

9800  1.42229       0.014159
9900  1.45233       0.014341
10000  1.48106      0.014606

请注意我的矢量时间接近你的。另一方面，优先级队列时间较小。这将表明优先级队列的调试友好且非常慢的实现，并且因此在很大程度上贡献了评论上热门舔提到的常量。

然后使用-O3，完全优化（接近发布模式）。

1000  0.000837      7.4e-05

9800  0.077041      0.000754
9900  0.078601      0.000762
10000  0.080205     0.000771

现在看看这是否合理，你可以使用简单的公式来提高复杂性。

time = k * N * N;  // 0.0008s
k = 8E-10

计算N = 10000

time = k * 10000 * 10000 //    Which conveniently gives 
time = 0.08

完美的结果，符合O（N²）和良好的实施。当然，O（NlogN）部分也是如此。

Answer 4

我认为这个问题确实比人们预期的更为微妙。在您 O（N ^ 2）解决方案中，您没有进行分配，算法就位，搜索最大并与当前位置交换。这没关系。

但是在priority_queue版本 O（N log N）（内部的priority_queue默认情况下会std::vector存储状态）。这个vector当你push_back逐个元素时，它有时需要增长（并且确实如此），但这是你在 O（N ^ 2）版本中不会丢失的时间。如果您对priority_queue：

的初始化进行了以下微小更改

priority_queue<int> pq(a.begin(), a.end());代替for loop

O（N log N）的时间超过 O（N ^ 2），因为它应该相当多。在建议的更改中，priority_queue版本中仍然存在分配，但只有一次（您为大vector大小节省了大量分配，并且分配是重要的耗时操作之一）也许初始化（在 O（N）中可以利用priority_queue的完整状态，不知道STL是否真的这样做了。

示例代码（用于编译和运行）：

#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <queue>
#include <Windows.h>
#include <assert.h>

using namespace std;

double time_slower_sort(vector<int>& a) {
    LARGE_INTEGER frequency, start, end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE)
        exit(0);
    if (::QueryPerformanceCounter(&start) == FALSE)
        exit(0);

    for (size_t i = 0; i < a.size(); ++i) {

        vector<int>::iterator it = max_element(a.begin() + i, a.end());
        int max_value = *it;
        *it = a[i];
        a[i] = max_value;
    }
    if (::QueryPerformanceCounter(&end) == FALSE)
        exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) /
           frequency.QuadPart;
}

double time_faster_sort(vector<int>& a) {
    LARGE_INTEGER frequency, start, end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE)
        exit(0);
    if (::QueryPerformanceCounter(&start) == FALSE)
        exit(0);

    // Push into the priority queue. Logarithmic cost per insertion = > O (n
    // log(n)) total insertion cost
    priority_queue<int> pq(a.begin(), a.end());  // <----- THE ONLY CHANGE IS HERE

    // Read of the elements from the priority queue in order of priority
    // logarithmic reading cost per read => O(n log(n)) reading cost for entire
    // vector
    for (size_t i = 0; i < a.size(); ++i) {
        a[i] = pq.top();
        pq.pop();
    }
    if (::QueryPerformanceCounter(&end) == FALSE)
        exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) /
           frequency.QuadPart;
}

int main(int argc, char** argv) {
    // Iterate over vectors of different sizes and try out the two different
    // variants
    for (size_t N = 1000; N <= 10000; N += 100) {

        // initialize two vectors with identical random elements
        vector<int> a(N), b(N);

        // initialize with random elements
        for (size_t i = 0; i < N; ++i) {
            a[i] = rand() % 1000;
            b[i] = a[i];
        }

        // Sort the two different variants and time them
        cout << N << "  " << time_slower_sort(a) << "\t\t"
             << time_faster_sort(b) << endl;

        // Sanity check
        for (size_t i = 0; i <= N - 2; ++i) {
            assert(a[i] == b[i]);     // both should return the same answer
            assert(a[i] >= a[i + 1]); // else not sorted
        }
    }
    return 0;
}

在我的PC（Core 2 Duo 6300）中，获得的输出是：

1100  0.000753738      0.000110263
1200  0.000883201      0.000115749
1300  0.00103077       0.000124526
1400  0.00126994       0.000250698
...
9500  0.0497966        0.00114377
9600  0.051173         0.00123429
9700  0.052551         0.00115804
9800  0.0533245        0.00117614
9900  0.0555007        0.00119205
10000 0.0552341        0.00120466

为什么我的n log（n）heapsort比我的n ^ 2选择排序慢

4 个答案: