在阅读了一些关于堆/优先级队列之后,我最近自己实现了一个。之后我决定将我的实现性能与我在书中找到的性能进行比较,结果对我来说有点混乱。看起来这两种实现的插入方法之间存在巨大的性能差异。
我使用此代码测试两个堆:
Random rnd = new Random();
long startTime = System.currentTimeMillis();
for(int i = 0; i < 1_000_000_0; i++) heap.insert(rnd.nextInt(1000));
System.out.println(System.currentTimeMillis() - startTime);
当我使用我的堆实现运行它时,我得到大约600ms的结果。当我用本书的实现来运行它时,我得到了大约1900ms。差异怎么可能这么大?当然,我的实施肯定有问题。
我的实施:
public class Heap<T extends Comparable<? super T>> {
private T[] array = (T[])new Comparable[10];
private int size = 0;
public void insert(T data) {
if(size+1 > array.length) expandArray();
array[size++] = data;
int pos = size-1;
T temp;
while(pos != 0 && array[pos].compareTo(array[pos/2]) < 0) {
temp = array[pos/2];
array[pos/2] = array[pos];
array[pos] = temp;
pos /= 2;
}
}
private void expandArray() {
T[] newArray = (T[])new Comparable[array.length*2];
for(int i = 0; i < array.length; i++)
newArray[i] = array[i];
array = newArray;
}
}
本书的实施:
public class BooksHeap<AnyType extends Comparable<? super AnyType>>
{
private static final int DEFAULT_CAPACITY = 10;
private int currentSize;
private AnyType [ ] array;
public BinaryHeap( )
{
this( DEFAULT_CAPACITY );
}
public BinaryHeap( int capacity )
{
currentSize = 0;
array = (AnyType[]) new Comparable[ capacity + 1 ];
}
public void insert( AnyType x )
{
if( currentSize == array.length - 1 )
enlargeArray( array.length * 2 + 1 );
int hole = ++currentSize;
for( array[ 0 ] = x; x.compareTo( array[ hole / 2 ] ) < 0; hole /= 2 )
array[ hole ] = array[ hole / 2 ];
array[ hole ] = x;
}
private void enlargeArray( int newSize )
{
AnyType [] old = array;
array = (AnyType []) new Comparable[ newSize ];
for( int i = 0; i < old.length; i++ )
array[ i ] = old[ i ];
}
}
编辑:本书是由Mark Allen Weiss撰写的“Java中的数据结构和算法分析”。第三版。 ISBN:0-273-75211-1。
答案 0 :(得分:9)
此处,您的代码使用JMH测量:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@OperationsPerInvocation(Measure.SIZE)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread)
@Fork(1)
public class Measure
{
static final int SIZE = 4_000_000;
private Random rnd;
@Setup public void setup() {
rnd = new Random();
}
@Benchmark public Object heap() {
Heap<Integer> heap = new Heap<>();
for (int i = 0; i < SIZE; i++) heap.insert(rnd.nextInt());
return heap;
}
@Benchmark public Object booksHeap() {
BooksHeap<Integer> heap = new BooksHeap<>();
for (int i = 0; i < SIZE; i++) heap.insert(rnd.nextInt());
return heap;
}
public static class Heap<T extends Comparable<? super T>> {
private T[] array = (T[])new Comparable[10];
private int size = 0;
public void insert(T data) {
if(size+1 > array.length) expandArray();
array[size++] = data;
int pos = size-1;
T temp;
while(pos != 0 && array[pos].compareTo(array[pos/2]) < 0) {
temp = array[pos/2];
array[pos/2] = array[pos];
array[pos] = temp;
pos /= 2;
}
}
private void expandArray() {
T[] newArray = (T[])new Comparable[array.length*2];
for (int i = 0; i < array.length; i++)
newArray[i] = array[i];
array = newArray;
}
}
public static class BooksHeap<AnyType extends Comparable<? super AnyType>>
{
private static final int DEFAULT_CAPACITY = 10;
private int currentSize;
private AnyType [ ] array;
public BooksHeap()
{
this( DEFAULT_CAPACITY );
}
public BooksHeap( int capacity )
{
currentSize = 0;
array = (AnyType[]) new Comparable[ capacity + 1 ];
}
public void insert( AnyType x )
{
if( currentSize == array.length - 1 )
enlargeArray( array.length * 2 + 1 );
int hole = ++currentSize;
for( array[ 0 ] = x; x.compareTo( array[ hole / 2 ] ) < 0; hole /= 2 )
array[ hole ] = array[ hole / 2 ];
array[ hole ] = x;
}
private void enlargeArray( int newSize )
{
AnyType [] old = array;
array = (AnyType []) new Comparable[ newSize ];
for( int i = 0; i < old.length; i++ )
array[ i ] = old[ i ];
}
}
}
结果:
Benchmark Mode Cnt Score Error Units
Measure.booksHeap avgt 5 62,712 ± 23,633 ns/op
Measure.heap avgt 5 62,784 ± 44,228 ns/op
它们完全一样。
练习的道德:不要以为你只能写一个循环并称之为基准。在像HotSpot这样的复杂,自我优化的运行时内测量任何有意义的东西是一项难以置信的难度挑战,最好留给像JMH这样的专家基准测试工具。
作为旁注,如果使用System.arraycopy
而不是手动循环,则可以减少约20%的时间(在两种实现中)。令人尴尬的是,这不是我的想法 - IntelliJ IDEA的自动检查建议,并自行转换代码:)
答案 1 :(得分:2)
对这个问题的实现测试进行测试,如何测试这些实现可以解释很多差异,请考虑这个例子。当我将Heap放在一个名为OPHeap
的类中,并将该书的堆放在一个名为BookHeap
的类中,然后按此顺序进行测试:
import java.util.Random;
public class Test {
public static void main(String ...args) {
{
Random rnd = new Random();
BookHeap<Integer> heap = new BookHeap<Integer>();
long startTime = System.currentTimeMillis();
for(int i = 0; i < 1_000_000_0; i++) heap.insert(rnd.nextInt(1000));
System.out.println("Book's Heap:" + (System.currentTimeMillis() - startTime));
}
{
Random rnd = new Random();
OPHeap<Integer> heap = new OPHeap<Integer>();
long startTime = System.currentTimeMillis();
for(int i = 0; i < 1_000_000_0; i++) heap.insert(rnd.nextInt(1000));
System.out.println(" OP's Heap:" + (System.currentTimeMillis() - startTime));
}
}
}
我得到了这个输出:
Book's Heap:1924
OP's Heap:1171
然而,当我交换测试的顺序时,我得到了这个输出:
OP's Heap:1867
Book's Heap:1515
这称为“热身”,你可以从this article学到很多方法来处理它。此外,只要您在测试中使用Random,就应该定义种子值,这样您的“伪随机”结果是可预测的。