Question

我正在用C ++编写代码，我可以选择在运行1 for循环和4个加法运算之间，或运行4个单独的for循环，每个循环都有1个加法运算。（作为旁注，我正在考虑这个因为4循环-1加法意味着我在我写的程序中分配了1/4的内存）

本能地，我希望1-loop-4-addeds更快，我做了一个快速的基准测试，证明了这一点。 1-loop-4-additions占用了4-loops-1-addition的一半时间

我的问题：正在发生什么样的过程才能产生这种差异？

下面是我用于测试的代码 - 我是一名数学家，而不是程序员，因此我有机会做一些愚蠢的事情。我正在使用2D数组，因为这就是我正在编码的内容。

#include <stdio.h>
#include <ctime>
#include <iostream>

using namespace std;

int main(){
    int Nx=100;
    int Ny=Nx;
    double holder=0;

    double test[Nx][Ny];
    double test1[Nx][Ny];
    double test2[Nx][Ny];
    double test3[Nx][Ny];
    double test4[Nx][Ny];

    for(int i=0;i<Nx;i++){
        for(int j=0;j<Nx;j++){
            test[i][j]=1;
            test1[i][j]=1;
            test2[i][j]=1;
            test3[i][j]=1;
            test4[i][j]=1;
        }
    }
    clock_t begin= clock();
    for(int i=0;i<Nx;i++){
        for(int j=0;j<Ny;j++){
            holder=holder + test[i][j];
        }
    }
    for(int i=0;i<Nx;i++){
        for(int j=0;j<Ny;j++){
            holder=holder + test[i][j];
        }
    }
    for(int i=0;i<Nx;i++){
        for(int j=0;j<Ny;j++){
            holder=holder + test[i][j];
        }
    }
    for(int i=0;i<Nx;i++){
        for(int j=0;j<Ny;j++){
            holder=holder + test[i][j];
        }
    }
    clock_t end = clock();
    double elapsed = (double) (end-begin)/CLOCKS_PER_SEC;

    cout<<"Time to run 1 addition in 4 for loops="<<elapsed<<endl;

    begin= clock();
    for(int i=0;i<Nx;i++){
        for(int j=0;j<Ny;j++){
            holder=holder + test1[i][j]+ test2[i][j]+ test3[i][j]+ test4[i][j];
        }
    }
    end = clock();
    elapsed = (double) (end-begin)/CLOCKS_PER_SEC;

    cout<<"Time to run 4 additions in 1 for loop="<<elapsed<<endl;
}

Answer 1

使用第一个选项执行4 *（Nx Ny）操作，第二个选项执行Nx Ny操作，因此第二个循环完成更快是正常的更大的Nx，Ny

Answer 2

让我们仔细看看这两个循环，看看发生了什么。如果仔细观察，2D阵列的addition操作数相同。但是，这两种方法在i<Nx和j<Ny以及i和j的增量之间进行比较的次数是不同的。它是1st方法的4倍。

这可能是两种方法执行时间背后的原因之一。

开始使用for循环需要多长时间？

2 个答案: