Question

我试图展示List.contains()与手动搜索执行时间之间的区别，结果很棒。这是代码，

public static void main(String argv[]) {
    List<String> list = new ArrayList<String>();
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("b");

    long startTime = System.nanoTime();

    list.contains("b");

    long endTime = System.nanoTime();
    long duration = endTime - startTime;

    System.out.println("First run: "+duration);

    startTime = System.nanoTime();
    for(String s: list){
        if(s.equals("b"))
            break;
    }
    endTime = System.nanoTime();

    duration = endTime - startTime;
    System.out.println("Second run: "+duration);

}

输出：

首次运行：7500
第二轮：158685
1. contains（）函数如何产生如此大的差异？
2. 它使用哪种搜索算法？
3. 如果列表包含搜索到的元素，它会在第一个元素处终止搜索吗？

Answer 1

首先，相信来自单一测试的结果是不明智的。有太多可变因素，要考虑的缓存含义以及其他类似的事情 - 您应该考虑编写一个在某种程度上使用随机化而不是试验的测试，并执行数百万次不同的检查，而不仅仅是一次。

那就是说，我希望你的结果会保持不变; ArrayList使用自己的contains()方法实现indexOf()，该方法直接遍历它存储的底层数组。您可以自己查看here

另一方面，foreach循环需要实例化Iterator，通过其所有方法访问数组，并且通常比ArrayList自己的直接实现做更多的工作确实。但是，你应该再对它进行更彻底的测试！

Answer 2

正如您从code contains需要 O（n）迭代中看到的那样。如果您将for循环重新实现为：

for(int i=0; i < list.size(); i++){
    if(list.get(i).equals("b"))
        break;
}

您会看到搜索时间的显着改善。所以你可以把责任归咎于List iterator的时间开销。 Iterator实例化以及next和hasNext方法的调用正在增加几毫秒。

Answer 3

写correct microbenchmark很难。如果您使用更好的基准测试，您可能会发现这些方法之间的差异很小 - 至少，以下基准测试更加稳健，并且两种方法之间的执行时间差异仅为10％：

public abstract class Benchmark {

    final String name;

    public Benchmark(String name) {
        this.name = name;
    }

    abstract int run(int iterations) throws Throwable;

    private BigDecimal time() {
        try {
            int nextI = 1;
            int i;
            long duration;
            do {
                i = nextI;
                long start = System.nanoTime();
                run(i);
                duration = System.nanoTime() - start;
                nextI = (i << 1) | 1;
            } while (duration < 1000000000 && nextI > 0);
            return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public String toString() {
        return name + "\t" + time() + " ns";
    }

    public static void main(String[] args) throws Exception {
        final List<String> list = new ArrayList<String>();
        for (int i = 0; i < 1000; i++) {
            list.add("a");
        }

        Benchmark[] marks = {
            new Benchmark("contains") {
                @Override
                int run(int iterations) throws Throwable {
                    for (int i = 0; i < iterations; i++) {
                        if (list.contains("b")) {
                            return 1;
                        }
                    }
                    return 0;
                }
            },
            new Benchmark("loop") {
                @Override
                int run(int iterations) throws Throwable {
                    for (int i = 0; i < iterations; i++) {
                        for (String s : list) {
                            if (s.equals("b")) {
                                return 1;
                            }
                        }
                    }
                    return 0;
                }
            }
        };

        for (Benchmark mark : marks) {
            System.out.println(mark);
        }
    }
}

打印（在我的日期笔记本上，在服务器模式下的Java 7 Oracle JVM上）：

contains    10150.420 ns
loop        11363.640 ns

循环的稍大开销可能是由Iterator检查并发修改引起的，每次访问时列表末尾两次引起，详见java.util.ArrayList.Itr.next()的源代码。

编辑：使用非常短的列表，差异更明显。例如，对于长度为1的列表：

contains    15.316 ns
loop        69.401 ns

尽管如此，测量结果显示，在20：1的比例附近......

java：List.contains（）与手动搜索的性能差异

3 个答案: