Question

我有一个程序可以进行大量的矩阵乘法运算。我想我会通过减少代码中的循环次数来加快速度，看看会有多快（我稍后会尝试使用矩阵数学库）。事实证明它根本不快。我已经能够通过一些示例代码复制问题。我的猜测是testOne（）会比testTwo（）快，因为它不会创建任何新数组，因为它有第三个循环。在我的机器上，它需要两倍的运行时间：

具有5000个纪元的testOne的持续时间：657，loopCount：64000000

具有5000个纪元的testTwo的持续时间：365，loopCount：192000000

我的猜测是multOne()比multTwo()慢，因为在multOne()中，CPU没有像multTwo()那样写入顺序存储器地址。听起来不错吗？任何解释都将不胜感激。

import java.util.Random;

public class ArrayTest {

    double[] arrayOne;
    double[] arrayTwo;
    double[] arrayThree;

    double[][] matrix;

    double[] input;
    int loopCount;

    int rows;
    int columns;

    public ArrayTest(int rows, int columns) {
        this.rows = rows;
        this.columns = columns;
        this.loopCount = 0;
        arrayOne = new double[rows];
        arrayTwo = new double[rows];
        arrayThree = new double[rows];
        matrix = new double[rows][columns];
        Random random = new Random();
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                matrix[i][j] = random.nextDouble();
            }
        }
    }

    public void testOne(double[] input, int epochs) {
        this.input = input;
        this.loopCount = 0;
        long start = System.currentTimeMillis();
        long duration;
        for (int i = 0; i < epochs; i++) {
            multOne();
        }
        duration = System.currentTimeMillis() - start;
        System.out.println("Duration for testOne with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
    }

    public void multOne() {
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                arrayOne[i] += matrix[i][j] * arrayOne[i] * input[j];
                arrayTwo[i] += matrix[i][j] * arrayTwo[i] * input[j];
                arrayThree[i] += matrix[i][j] * arrayThree[i] * input[j];
                loopCount++;
            }
        }
    }

    public void testTwo(double[] input, int epochs) {

        this.loopCount = 0;
        long start = System.currentTimeMillis();
        long duration;
        for (int i = 0; i < epochs; i++) {
            arrayOne = multTwo(matrix, arrayOne, input);
            arrayTwo = multTwo(matrix, arrayTwo, input);
            arrayThree = multTwo(matrix, arrayThree, input);
        }
        duration = System.currentTimeMillis() - start;
        System.out.println("Duration for testTwo with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
    }

    public double[] multTwo(double[][] matrix, double[] array, double[] input) {
        double[] newArray = new double[rows];
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                newArray[i] += matrix[i][j] * array[i] * input[j];
                loopCount++;
            }
        }
        return newArray;
    }

    public static void main(String[] args) {
        int rows = 100;
        int columns = 128;
        ArrayTest arrayTest = new ArrayTest(rows, columns);
        Random random = new Random();
        double[] input = new double[columns];
        for (int i = 0; i < columns; i++) {
            input[i] = random.nextDouble();
        }
        arrayTest.testOne(input, 5000);
        arrayTest.testTwo(input, 5000);
    }
}

Answer 1

为什么您的测试需要不同的时间有一个简单的原因：他们不会做同样的事情。由于您比较的两个循环在功能上不相同，因此迭代次数不是一个很好的指标。

testOne需要的时间超过testTwo，因为：

在multOne中，您在每次迭代期间更新arrayOne[i] j循环。这意味着对于内循环j的每次迭代您正在使用 arrayOne[i]的新值，计算在上一次迭代。这创建了一个循环携带依赖，即更难为编译器优化，因为您需要输出下一个操作matrix[i][j] * arrayOne[i] * input[j]的操作 CPU时钟周期。浮点数不可能实现这一点通常具有几个时钟周期的延迟的操作它会导致失速，从而降低性能。
在testTwo你每arrayOne次迭代仅更新一次epoch，并且由于没有携带的依赖性，循环可以被矢量化有效地，这导致更好的缓存和算术性能

为什么减少循环次数不会加快程序的速度？

1 个答案: