为什么这个顺序数组循环比使用“查找”数组的循环慢?

时间:2013-12-10 13:48:23

标签: c arrays loops

我最近一直在研究缓存局部性,我正在尝试了解CPU如何访问内存。我写了一个实验,看看在顺序循环数组时是否存在性能差异,而使用某种查找表来索引数据数组。我很惊讶地发现查找方法稍快一些。我的代码如下。我在Windows上用GCC编译(MinGW)。

#include <stdlib.h>
#include <stdio.h>
#include <windows.h>

int main()
{
    DWORD dwElapsed, dwStartTime;

    //random arrangement of keys to lookup
    int lookup_arr[] = {0, 3, 8, 7, 2, 1, 4, 5, 6, 9};

    //data for both loops
    int data_arr1[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int data_arr2[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    //first loop, sequential access
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr1[i]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Normal loop completed: %d\n", dwElapsed);

    //second loop, indexes into data_arr2 using the lookup array
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr2[lookup_arr[i]]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Lookup loop completed: %d\n", dwElapsed);

    return 0;
}

运行这个,我得到:

Normal loop completed: 375
Lookup loop completed: 297

2 个答案:

答案 0 :(得分:2)

跟进我之前的评论,以下是你如何做这件事。

  1. 重复测量
  2. 估计错误
  3. 大内存块
  4. 随机化与线性索引(所以无论哪种方式都有间接)
  5. 结果是速度与“随机索引”有显着差异。

    #include <stdio.h>
    #include <time.h>
    #include <stdlib.h>
    #include <math.h>
    
    #define N 1000000
    
    int main(void) {
      int *rArr;
      int *rInd; // randomized indices
      int *lInd; // linear indices
      int ii;
    
      rArr = malloc(N * sizeof(int) );
      rInd = malloc(N * sizeof(int) );
      lInd = malloc(N * sizeof(int) );
    
      for(ii = 0; ii < N; ii++) {
        lInd[ii] = ii;
        rArr[ii] = rand();
        rInd[ii] = rand()%N;
      }
    
      int loopCount;
      int sum;
      time_t startT, stopT;
      double dt, totalT=0, tt2=0;
    
      startT = clock();
      for(loopCount = 0; loopCount < 100; loopCount++) {
        for(ii = 0; ii < N; ii++) {
          sum += rArr[lInd[ii]];
        }
        stopT = clock();
        dt = stopT - startT;
        totalT += dt;
        tt2 += dt * dt;
        startT = stopT;
      }
      printf("sum is %d\n", sum);
      printf("total time: %lf += %lf\n", totalT/(double)(CLOCKS_PER_SEC), (tt2 - totalT * totalT / 100.0)/100.0 / (double)(CLOCKS_PER_SEC));
    
      totalT = 0; tt2 = 0;
      startT = clock();
      for(loopCount = 0; loopCount < 100; loopCount++) {
        for(ii = 0; ii < N; ii++) {
          sum += rArr[rInd[ii]];
        }
        stopT = clock();
        dt = stopT - startT;
        totalT += dt;
        tt2 += dt * dt;
        startT = stopT;
      }
      printf("sum is %d\n", sum);
      printf("total time: %lf += %lf\n", totalT/(double)(CLOCKS_PER_SEC), sqrt((tt2 - totalT * totalT / 100.0)/100.0) / (double)(CLOCKS_PER_SEC));
    }
    

    结果 - 顺序访问是>快2倍(在我的机器上):

    sum is -1444272372
    total time: 0.396539 += 0.000219
    sum is 546230204
    total time: 0.756407 += 0.001165
    

    通过-O3优化,差异更加明显 - 快3倍:

    sum is -318372465
    total time: 0.142444 += 0.013230
    sum is 1672130111
    total time: 0.455804 += 0.000402
    

答案 1 :(得分:1)

我相信你正在编译而没有启用优化。使用-O2 g ++优化所有内容,使运行时间为0,如果没有标记,我会得到类似的结果。

修改程序后,data_arr1data_arr2中的值实际上用于我得到78ms的值。