Question

使用最后一个数组作为Java中的一个轴来实现基本算法，对于排序一个100,000,000个随机数的元素阵列需要5个小时是正常的吗？

我的系统规格： Mac OS X Lion 10.7.2（2011）英特尔酷睿i5 2.3 GHz 8GB内存

更新2：所以我认为我在其他方法中做错了，因为Narendra能够运行快速排序。这是我试图运行的完整代码。

import java.util.Random;

public class QuickSort {
public static int comparisons = 0;

public static void main(String[] args) {
    int size = 100000000;
    int[] smallSampleArray = createArrayOfSize(size);

    System.out.println("Starting QS1...");
    long startTime = System.currentTimeMillis();
    quickSort(smallSampleArray,0,size-1);
    System.out.println(  "Finished QS1 in " + (System.currentTimeMillis() - startTime)+ " seconds");
    System.out.println("Number of comparisons for QS1: " + comparisons);

}

public static int[] createArrayOfSize(int arraySize) {
    int[] anArray = new int[arraySize];
    Random random = new Random();

    for(int x=0; x < anArray.length; x++ ) {
        anArray[x] = random.nextInt(1000) + 1;;
    }
    return anArray;
}


public static void quickSort( int anArray[], int position, int pivot) {

    if( position < pivot ) {
        int q = partition(anArray, position, pivot);

        quickSort(anArray, position, q-1);
        quickSort(anArray, q+1, pivot);

    }

}

public static int partition(int anArray[], int position, int pivot ) {
    int x = anArray[pivot];
    int i = position - 1; 

    for(int j = position; j < (pivot-1); j++ ) {
        comparisons++;
        if(anArray[j] <= x) {
             i = i + 1;
             int temp =  anArray[i];
             anArray[i] = anArray[j];
             anArray[j] = temp;
        }

    }
    int temp = anArray[i+1];
    anArray[i+1] = anArray[pivot];
    anArray[pivot] = temp;



        return i+1;
    }

}

Answer 1

我已经把旧的，现在无关紧要的答案移到了最后。

编辑x2

啊哈！我想我找到了你糟糕表现的原因。你告诉我们你使用的是随机数据。那是真实的。但是你没有告诉我们的是你使用了这么小范围的随机值。

对我来说，如果您更改此行，您的代码将非常高效：

anArray[x] = random.nextInt(1000) + 1;

到此：

anArray[x] = random.nextInt();

这违背了预期，对吗？对较小范围的值进行排序应该更便宜，因为我们需要做的交换应该更少，对吧？那么为什么会发生这种情况？这是因为你有这么多具有相同价值的元素（平均为10万）。那么为什么会导致如此糟糕的表现呢？好吧，在每一点上都说你选择了完美的枢轴值：恰好是中途。这就是它的样子：

1000 - Pivot: 500
 - 500+ - Pivot: 750
   - 750+ - Pivot: 875
   - 750- - Pivot: 625
 - 500- - Pivot: 250

等等。然而（这里是关键部分）您最终会进入分区操作，其中每个值都等于分区值。换句话说，将会有一个大（10万大）数字块，其值与您尝试递归排序的值相同。那会怎么样？它将递归 10万次，仅删除每个级别的单个透视值。换句话说，它会将所有内容分区为左侧，或将所有内容分区为右侧。

扩展上面的细分，它看起来有点像这样（我使用8 - 2的幂 - 为简单起见，并原谅了糟糕的图形表示）

Depth Min  Max  Pvt NumElements

0     0     7    4   100 000 000
1     0     3    2    50 000 000    
2     0     1    1    25 000 000
3     0     0    0    12 500 000 < at this point, you're
4     0     0    0    12 499 999 < no longer dividing and
5     0     0    0    12 499 998 < conquering effectively.
3     1     1    1    12 500 000
4     1     1    1    12 499 999
5     1     1    1    12 499 998
2     2     3    3    25 000 000
3     ...    
3     ...    
1     4     7    6    50 000 000    
2     4     5    5    25 000 000
3     ...
3     ...    
2     6     7    7    25 000 000
3     ...
3     ...

如果您想对此进行反击，则需要优化代码以减少此问题的影响。更多关于那件事（我希望）......

......继续。解决问题的一种简单方法是检查数组是否已在每一步都进行了排序。

public static void quickSort(int anArray[], int position, int pivot) {

    if (isSorted(anArray, position, pivot + 1)) {
        return;
    }

    //...
}


private static boolean isSorted(int[] a, int start, int end) {
    for (int i = start+1; i < end; i++) {
        if (a[i] < a[i-1]) {
            return false;
        }
    }
    return true;
}

添加它，你不会不必要地进行递归，你应该是金色的。实际上，与在整数的所有32位上随机化的值相比，您获得了更好的性能。

旧答案（仅适用于后代）

您的分区逻辑看起来真的让我怀疑。让我们提取并忽略交换逻辑。这就是你拥有的：

    int i = position - 1; 

    for(int j = position; j < pivot; j++ ) {

        if(anArray[j] <= x) {
             i = i + 1;
             swap(anArray, i, j);
        } 

    }

我看不出这是怎么回事。例如，如果第一个值小于枢轴值，它将与自身交换？

我想你想要这样的东西（只是粗略的草图）：

for ( int i = 0, j = pivot - 1; i < j; i++ ) {

   if ( anArray[i] > pivotValue ) {
      //i now represents the earliest index that is greater than the pivotValue,
      //so find the latest index that is less than the pivotValue
      while ( anArray[j] > pivotValue ) {
         //if j reaches i then that means that *all* 
         //indexes before i/j are less than pivot and all after are greater
         //and so we should break out here
         j--;
      }

      swap(anArray, i, j);
   }
} 

//swap pivot into correct position
swap(anArray, pivot, j+1);

修改

我认为我现在理解了原始的分区逻辑（我把if-block混淆了看看比枢轴更大的元素）。我会给出答案，因为它可以提供更好的性能，但我怀疑它会产生显着的差异。

Answer 2

Beeing a c＃guy我只是将上面的代码粘贴到一个空的c＃项目中完成一个100.000.000整数数组需要35秒才能完成代码似乎没有任何问题，您的环境中必定还有其他内容。 Java进程是否允许分配~800 MB的RAM？

如果将阵列大小降低到10.000.000会发生什么。你接近~3秒呢？是否存在某种数组大小，其中排序突然变慢？

修改

我几乎可以肯定你没有随机数组，你的随机初始化可能会失败。

如果为每个元素创建一个新的Random对象，则每个元素的每个初始化都会为每个元素获取相同的值，因为Random的每次初始化都会以随机生成器的当前时间为单位，以毫秒为单位。如果整个数组在相同的毫秒内被初始化，则所有元素都获得相同的值。

在c＃中我像这样初始化

Random r = new Random(); var intArr = (from i in Enumerable.Range(0, 10000) select r.Next()).ToArray(); var sw = System.Diagnostics.Stopwatch.StartNew(); quickSort(intArr, 0, intArr.Length - 1); sw.Stop();

排序需要2毫秒。

如果我为每个元素重新初始化我的Random对象

var intArr = (from i in Enumerable.Range(0, 10000) select (new Random()).Next()).ToArray();

我需要300毫秒进行排序，因为数组中的所有元素都具有相同的值。

对于100,000,000元素阵列，quicksort花费5个小时是否正常？

2 个答案:

编辑x2

旧答案（仅适用于后代）

修改