Question

令我惊讶的是，当＆＃34;优化＆＃34;时，我得到更长的时间（10毫秒）通过在数组中预生成结果与原始8毫秒相比进行乘法运算。这只是一个Java怪癖还是PC架构的一般？我有一个带有Java 7，Windows 8 64位的Core i5 760。

public class Test {
    public static void main(String[] args)  {
        long start = System.currentTimeMillis();
        long sum=0;
        int[] sqr = new int[1000];
        for(int a=1;a<1000;a++) {sqr[a]=a*a;}

        for(int b=1;b<1000;b++)
//          for(int a=1;a<1000;a++) {sum+=a*a+b*b;}
            for(int a=1;a<1000;a++) {sum+=sqr[a]+sqr[b];}
        System.out.println(System.currentTimeMillis()-start+"ms");
        System.out.println(sum);
    }
}

Answer 1

Konrad Rudolph commented on the issues与基准测试。所以我忽略了基准并专注于这个问题：

乘法比数组访问快吗？

是的，很有可能。它曾经是20或30年前的另一种方式。

粗略地说，你可以在3个周期内进行整数乘法（悲观，如果你没有得到向量指令），如果你直接得到它，你的内存访问需要4个周期L1缓存，但从那里直接下坡。供参考，参见

Intel 64 and IA-32 Architectures Optimization Reference Manual
Approximate cost to access various caches and main memory?
Herb Sutter关于这个主题的演讲：Machine Architecture: Things Your Programming Language Never Told You

Java特有的一个问题是pointed out by Ingo在下面的评论中：您还可以在Java中检查边界，这使得已经较慢的数组访问速度更慢......

Answer 2

更合理的基准是：

public abstract class Benchmark {

    final String name;

    public Benchmark(String name) {
        this.name = name;
    }

    abstract int run(int iterations) throws Throwable;

    private BigDecimal time() {
        try {
            int nextI = 1;
            int i;
            long duration;
            do {
                i = nextI;
                long start = System.nanoTime();
                run(i);
                duration = System.nanoTime() - start;
                nextI = (i << 1) | 1;
            } while (duration < 1000000000 && nextI > 0);
            return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public String toString() {
        return name + "\t" + time() + " ns";
    }

    private static void shuffle(int[] a) {
        Random chaos = new Random();
        for (int i = a.length; i > 0; i--) {
            int r = chaos.nextInt(i);
            int t = a[r];
            a[r] = a[i - 1];
            a[i - 1] = t;
        }
    }


    public static void main(String[] args) throws Exception {
        final int[] table = new int[1000];
        final int[] permutation = new int[1000];

        for (int i = 0; i < table.length; i++) {
            table[i] = i * i;
            permutation[i] = i;
        }
        shuffle(permutation);

        Benchmark[] marks = {
            new Benchmark("sequential multiply") {
                @Override
                int run(int iterations) throws Throwable {
                    int sum = 0;
                    for (int j = 0; j < iterations; j++) {
                        for (int i = 0; i < table.length; i++) {
                            sum += i * i;
                        }
                    }
                    return sum;
                }
            },
            new Benchmark("sequential lookup") {
                @Override
                int run(int iterations) throws Throwable {
                    int sum = 0;
                    for (int j = 0; j < iterations; j++) {
                        for (int i = 0; i < table.length; i++) {
                            sum += table[i];
                        }
                    }
                    return sum;
                }
            },
            new Benchmark("random order multiply") {
                @Override
                int run(int iterations) throws Throwable {
                    int sum = 0;
                    for (int j = 0; j < iterations; j++) {
                        for (int i = 0; i < table.length; i++) {
                            sum += permutation[i] * permutation[i];
                        }
                    }
                    return sum;
                }
            },
            new Benchmark("random order lookup") {
                @Override
                int run(int iterations) throws Throwable {
                    int sum = 0;
                    for (int j = 0; j < iterations; j++) {
                        for (int i = 0; i < table.length; i++) {
                            sum += table[permutation[i]];
                        }
                    }
                    return sum;
                }
            }
        };

        for (Benchmark mark : marks) {
            System.out.println(mark);
        }
    }
}

打印在我的intel core duo上（是的，它已经老了）：

sequential multiply    2218.666 ns
sequential lookup      1081.220 ns
random order multiply  2416.923 ns
random order lookup    2351.293 ns

因此，如果我按顺序访问查找数组（最小化缓存未命中数），并允许热点JVM优化对数组访问的边界检查，则对1000个元素的数组进行略微改进。如果我们对数组进行随机访问，那么这种优势就会消失。此外，如果表更大，查找速度会变慢。例如，对于10000个元素，我得到：

sequential multiply    23192.236 ns
sequential lookup      12701.695 ns
random order multiply  24459.697 ns
random order lookup    31595.523 ns

因此，除非访问模式（几乎）顺序且查找数组较小，否则数组查找并不比乘法快。

在任何情况下，我的测量表明乘法（和加法）仅需4个处理器周期（2GHz CPU上每循环迭代2.3 ns）。你不可能比这更快。此外，除非你每秒进行5亿次乘法，否则乘法不是你的瓶颈，优化代码的其他部分将更有成效。

乘法比数组访问快吗？

2 个答案: