Question

我正在测试两个几乎相同的代码，其中一个for循环有微小的差别。第一个使用三个循环来迭代索引y，z，x，而第二个迭代x，z，y。

我的问题是为什么用户时间和挂钟时间的差异？是因为一个代码和另一个代码中的内存位置？

test_1.c时：

#define N 1000

// Matrix definition
long long int A[N][N],B[N][N],R[N][N];

int main()
{
    int x,y,z;
    char str[100];

/*Matrix initialization*/ 
    for(y=0;y<N;y++) 
        for(x=0;x<N;x++)
        {
            A[y][x]=x;
            B[y][x]=y;
            R[y][x]=0;
        }
/*Matrix multiplication*/
    for(y=0;y<N;y++)
        for(z=0;z<N;z++) 
            for(x=0;x<N;x++) 
            {
                R[y][x]+= A[y][z] * B[z][x];
            }   
exit(0);
}

第二个代码（test_2.c）与最后一个for循环的区别：

for(x=0;x<N;x++)
    for(z=0;z<N;z++) 
        for(y=0;y<N;y++) 
        {
            R[y][x]+= A[y][z] * B[z][x];
        }

如果我打印/ user / bin / time -v ./test_1，我会得到以下统计信息：

Command being timed: "./test_1"
User time (seconds): 5.19
System time (seconds): 0.01
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.22

/ user / bin / time -v ./test_2提供以下统计信息：

Command being timed: "./test_2"
User time (seconds): 7.75
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.76

Answer 1

基本上，你是以不同的模式访问内存 - 你的第一种方法对内存缓存更加友好，因为你在同一区域访问了大量数据，然后转到下一块记忆等。

如果你想要一个真实世界的比喻，想象一下你正在向10条不同的道路（A-J）发送传单，每条道路都有1-10号门牌。您可以提供A1，A2，A3 ...... A10，B1，B2，B3 ...... B10等......或者您可以提供A1，B1，C1 ...... J1，A2，B2，C2 ...等。显然，第一种方式会更有效率。它就像在计算机内存中一样 - 访问你最近访问过的内存“附近”的内存比跳转内存更有效。

代码几乎相同，运行时间不同 - 为什么？

1 个答案: