我下面有两个类似的代码。
第一个代码:
unsigned long size = 256*1024*1024;
unsigned long stride = 256;
void *array = (void*)malloc(size);
for (unsigned long off = 0; off < size; off+=stride) {
*(unsigned int*)(array+off) = off+stride;
}
*(unsigned int*)(array+off)=0;
int i=10000000;
struct timeval start, end;
gettimeofday(&start, NULL);
while (i>=1) {
offset = *(unsigned int*)(array+off);
i--;
}
gettimeofday(&end, NULL);
*(volatile unsigned int*)(array+offset);
printf("%.2f\n", (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));
第二个代码:
unsigned long size = 256*1024*1024;
unsigned long stride = 256;
void *array = (void*)malloc(size);
for (unsigned long off = 0; off < size; off+=stride) {
*(unsigned int*)(array+off) = off+stride;
}
*(unsigned int*)(array+off)=0;
int i=10000000;
struct timeval start, end;
gettimeofday(&start, NULL);
#define ONE offset = *(unsigned int*)(array+off);
#define FIVE ONE ONE ONE ONE ONE
#define TEN FIVE FIVE
#define FIFTY TEN TEN TEN TEN TEN
#define HUNDRED FIFTY FIFTY
while (i>=1000) {
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
i-=1000;
}
gettimeofday(&end, NULL);
*(volatile unsigned int*)(array+offset);
printf("%.2f\n", (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));
问题
两个代码之间的唯一区别是“ while循环”。 它们都测量while循环的经过时间。
第一个代码的结果为779,851,000 ns,第二个代码的结果为1,624,344,000 ns。 (大2.1倍)
我认为这种差异来自L1-i缓存未命中,所以我用性能来测量L1-i缓存未命中。
但是,第一代码的L1-i高速缓存未命中为34,541,第二代码的L1-i高速缓存未命中为43,078。 (大1.2倍)
此结果无法完全解释while循环所经过时间的差异。
两个代码的经过时间之间有何大差异?
我想念什么吗?