这个简单的C代码首先创建一个0xFFFFFF元素的数组,然后传递两次,测量每次传递花费的时间:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define TESTSIZ 0xffffff
char testcases[TESTSIZ];
void gentestcases(void)
{
size_t i = 0;
while(i < TESTSIZ)
testcases[i++] = rand()%128;
return;
}
long long time_elapsed(struct timespec beg, struct timespec end)
{
if(end.tv_nsec < beg.tv_nsec) {
end.tv_nsec += 1000000000;
end.tv_sec--;
}
return 1000000000ll*(end.tv_sec-beg.tv_sec) + end.tv_nsec-beg.tv_nsec;
}
long long test( int(*func)(int) )
{
struct timespec beg, end;
clock_gettime(CLOCK_MONOTONIC, &beg);
int volatile sink;
size_t i = 0;
while(i < TESTSIZ)
sink = islower(testcases[i++]);
clock_gettime(CLOCK_MONOTONIC, &end);
return time_elapsed(beg, end);
}
int main()
{
gentestcases();
struct timespec beg, end;
printf("1st pass took %lld nsecs\n", test(islower));
printf("2nd pass took %lld nsecs\n", test(islower));
}
我用gcc -O2 -std=gnu89 -o sb sillybench.c
通常我得到的结果是第二次处理数组的速度较慢。效果很小但很明显(1-3毫秒)和 - 只有一个例外 - 重复:
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13098789 nsecs
2nd pass took 13114677 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13052105 nsecs
2nd pass took 13134187 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13118069 nsecs
2nd pass took 13074199 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13038579 nsecs
2nd pass took 13079995 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13070334 nsecs
2nd pass took 13324378 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13031000 nsecs
2nd pass took 13167349 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13019961 nsecs
2nd pass took 13310211 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13041332 nsecs
2nd pass took 13311737 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13030913 nsecs
2nd pass took 13177423 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13060570 nsecs
2nd pass took 13387024 nsecs
为什么会这样?如果有的话,我认为处理第一次时间的数组应该更慢,而不是第二次!
如果这很重要:
m@m-X555LJ ~/UVA/fastIO $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
System: Host: m-X555LJ Kernel: 4.4.0-21-generic x86_64 (64 bit gcc: 5.3.1)
Desktop: Cinnamon 3.0.7 (Gtk 2.24.30) Distro: Linux Mint 18 Sarah
CPU: Dual core Intel Core i5-5200U (-HT-MCP-) cache: 3072 KB
flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 8786
clock speeds: max: 2700 MHz 1: 2200 MHz 2: 2202 MHz 3: 2200 MHz
4: 2200 MHz
答案 0 :(得分:5)
此效果很可能是由 turbo模式(或Intel Turbo Boost技术)引起的。 Turbo模式允许处理器内核以高于标称时钟频率运行。其中一个因素是时间突发 * 。通常在几分之一秒内,处理器将达到最高频率。第一个环路很可能以比第二个环路更高的时钟频率运行。
您可以通过手动设置标称频率(处理器的2.20 GHz)来确认,例如using cpufrequtils
或cpupower
。但是,在许多系统上使用intel_pstate
,它不允许用户空间管理器。以下是disable turbo mode for intel_pstate
- 或disable intel_pstate
所有人的合作方式。
如果没有turbo模式,性能应该是统一的。
* :温度是另一个因素,但我怀疑它在10毫秒的基准时间内起作用。为了说明,假设CPU超过它的15 W TDP并使用20 W:即使是微小的1 g铜也只能heat up by 0.5 K after 10 ms。我经常看到一个短暂的不同爆发(时间,几十毫秒到几秒),然后缓慢而稳定的下降(温度,几十秒到几分钟)
注意:gentestcases
在第一次实际测试之前运行了相当长的时间(例如240 ms),这有助于&#34; sprint&#34;处理器。