Question

我正在为不同长度的基于数组的实现计算插入排序算法。

我知道insert-sort是平均情况O（n ^ 2）所以我意识到当你尝试对大型数组进行排序时需要一点点，但为什么在底部的两个实现之间会有近6500ms的差异100,000个条目的数组？

这是我的数组的设置，填充了1-1百万的随机整数

int[] oneHundredThousand = new int[100000];
Random r = new Random();//RANDOMIZER

for(int i = 0; i < oneHundredThousand.length; i++)
        oneHundredThousand[i] = r.nextInt(1000000) + 1; //1 - 1000000

这是我运行的两种测试方法，它们使用insert-sort

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {         
        int cur = intArray[k];                
        int j = k;                          
        while (j > 0 && intArray[j-1] > cur) {  
            intArray[j] = intArray[j-1];              
            j--;                              
        }
        intArray[j] = cur;                      
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

和

public static long insertionSort2(int[] input){
    long startTime = System.currentTimeMillis();
    int temp;
    for (int i = 1; i < input.length; i++) {
        for(int j = i ; j > 0 ; j--){
            if(input[j] < input[j-1]){
                temp = input[j];
                input[j] = input[j-1];
                input[j-1] = temp;
            }
        }
    }
    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

现在在main中调用这些方法（复制数组以便通过每个'sort'保留原始顺序），我得到了注释结果，为什么它有这么大的不同？我不认为同一算法的不同实现应该效率低得多。

int[] copy100_1 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);
int[] copy100_2 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);

//816ms 
System.out.print(insertionSort1(copy100_1));
//7400ms
System.out.print(insertionSort2(copy100_2));

Answer 1

分析插入排序，可以发现最好的情况。执行时间是 O（n²）。让我们首先实现您的原因。

math.ceil

这意味着对于（部分）排序的数组，（此部分）的执行时间减少为 O（n）。

你的第二次实施缺乏这样的早期休息条件。但你可以很简单地添加它：

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {         
        int cur = intArray[k];                
        int j = k;

        while (j > 0 && intArray[j-1] > cur) { // This loop can break early due 
                                               // to intArray[j-1] > cur
            intArray[j] = intArray[j-1];              
            j--;                              
        }
        intArray[j] = cur;                      
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

我已将@amanin's answer作为模板并重新实施了所有三个版本。

public static void insertionSort2(int[] input) {
    int temp;
    for (int i = 1; i < input.length; i++) {
        for (int j = i; j > 0; j--) {
            if (input[j] < input[j - 1]) {
                temp = input[j];
                input[j] = input[j - 1];
                input[j - 1] = temp;
            } else {
                break; // this is the "early break" missing.
            }
        }
    }
}

这些是最终结果，package benchmark; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import java.util.Arrays; import java.util.Random; import java.util.concurrent.TimeUnit; @OutputTimeUnit(TimeUnit.MILLISECONDS) @BenchmarkMode(Mode.AverageTime) @Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS) @Fork(1) public class Test { @State(Scope.Benchmark) public static class Input { public static final Random rng = new Random(); final int[] array; final int[] expected; public Input() { final Random r = new Random(); this.array = new int[200_000]; for (int i = 0; i < this.array.length; i++) { this.array[i] = i; } this.expected = Arrays.copyOf(this.array, this.array.length); // Fisher-Yates shuffle for (int i = this.array.length - 1; i > 0; --i) { int swap = Input.rng.nextInt(i); int tmp = this.array[swap]; this.array[swap] = this.array[i]; this.array[i] = tmp; } } } @Benchmark public void benchSort1(final Input in) { insertionSort1(in.array); } @Benchmark public void benchSort2(final Input in) { insertionSort2(in.array); } @Benchmark public void benchSort3(final Input in) { insertionSort3(in.array); } public static void insertionSort1(int[] intArray) { int n = intArray.length; for (int k = 1; k < n; k++) { int cur = intArray[k]; int j = k; while (j > 0 && intArray[j - 1] > cur) { intArray[j] = intArray[j - 1]; j--; } intArray[j] = cur; } } public static void insertionSort2(int[] input) { int temp; for (int i = 1; i < input.length; i++) { for (int j = i; j > 0; j--) { if (input[j] < input[j - 1]) { temp = input[j]; input[j] = input[j - 1]; input[j - 1] = temp; } } } } public static void insertionSort3(int[] input) { int temp; for (int i = 1; i < input.length; i++) { for (int j = i; j > 0; j--) { if (input[j] < input[j - 1]) { temp = input[j]; input[j] = input[j - 1]; input[j - 1] = temp; } else { break; } } } } public static void main(String[] arg) throws Exception { Options option = new OptionsBuilder().include(Test.class.getSimpleName()).build(); new Runner(option).run(); } }和benchSort1是您的原始版本，benchSort2是“已更正”的双倍benchSort3版本：

for

正如您所看到的，两个版本现在都非常接近时间。

Answer 2

要完成上述答案，我想告诉您要正确比较两种算法的效果，您应该考虑使用真正的Benchmark框架，例如JMH。

那么，为什么呢？使用简单的主程序，您的程序太依赖于您的计算机状态：如果在第二次排序操作中，您的计算机开始交换，或者其他程序消耗大量处理能力，则会降低您的Java应用程序性能，因此您的措施在第二个算法上。另外，您无法控制JIT优化。

JMH尝试通过多次重复测试来提供更安全的测量，启动阶段在测量之前利用JIT等。

以下是对您的算法进行基准测试的示例示例：

// Those annotations control benchmark configuration. More info on JMH doc
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class SortingBenchmark {

/**
 * We create objects in charge of preparing data required by our benchmarks.
 * It is created before benchmark starts, so array initialization phase does
 * not pollute measures.
 */
@State(Scope.Benchmark)
public static class Input {

    final int[] array;

    public Input() {
        final Random r = new Random();
        array = new int[100000];
        for (int i = 0; i < array.length; i++) {
            array[i] = r.nextInt(1000000) + 1;
        }
    }
}

/**
 * Test first sorting method
 * @param in
 */
@Benchmark
public void benchSort1(final Input in) {
    insertionSort1(in.array);
}

/**
 * Test second sorting method
 * @param in
 */
@Benchmark
public void benchSort2(final Input in) {
    insertionSort2(in.array);
}

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {
        int cur = intArray[k];
        int j = k;
        while (j > 0 && intArray[j - 1] > cur) {
            intArray[j] = intArray[j - 1];
            j--;
        }
        intArray[j] = cur;
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

public static long insertionSort2(int[] input) {
    long startTime = System.currentTimeMillis();
    int temp;
    for (int i = 1; i < input.length; i++) {
        for (int j = i; j > 0; j--) {
            if (input[j] < input[j - 1]) {
                temp = input[j];
                input[j] = input[j - 1];
                input[j - 1] = temp;
            }
        }
    }
    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

/**
 * That's JMH boilerplate to launch the benchmark.
 * @param arg
 * @throws Exception
 */
public static void main(String[] arg) throws Exception {
    Options option = new OptionsBuilder().include(SortingBenchmark.class.getSimpleName()).build();
    new Runner(option).run();
}

这是结果（JMH的最终结论，以避免污染线程）：

# Run complete. Total time: 00:01:51

Benchmark                    Mode  Cnt     Score    Error  Units
SortingBenchmark.benchSort1  avgt    5     0,190 ±  0,016  ms/op
SortingBenchmark.benchSort2  avgt    5  1957,455 ± 73,014  ms/op

我希望它能帮助你进行未来的测试（注意：这是一个帮助我学习JMH的小教程：JMH tuto）。

为什么插入排序在时间上以不同的实现方式运行

2 个答案: