Question

所以我想通过在比较之前不将变量的值复制到另一个变量中来提高程序性能的程度（这将在示例中更好地解释），并且我注意到一些奇怪的东西。我有这两个代码段：

string

和

string a = "";
for (int i = 0; i < 1000000; i++) a += 'a';

for (int i = 0; i < 1000000; i++) {
    if ('b' == a.at(i));//compare the two chars directly
}

我认为第二段需要更长时间才能执行，因为与第一段相比，还有一个变量被声明。当我实际计算两个时，我发现第二个花费的时间少于第一个。我计时了几次，而第二次似乎总是花费大约0.13秒的时间来执行。这是完整的代码：

string a = "";
for (int i = 0; i < 100000000; i++) a += 'a';

for (int i = 0; i < 100000000; i++) {
    char c = a.at(i);//declare a new variable
    if ('b' == c);//compare the char with the newly created variable,
                  //instead of comparing it to the other char directly
}

为什么会这样？

编辑：我遵循了NathanOliver的建议，我为每个循环添加了单独的字符串，所以现在代码看起来像这样：

#include <string>
#include <iostream>
#include <ctime>

using namespace std;

int main() {
    clock_t timer;

    string a = "";
    string b;

    for (int i = 0; i < 100000000; i++)
        a += "a";

    timer = clock();

    for (int i = 0; i < 100000000; i++) {
        if ('b'==a.at(i)) b += "a";
    }

    cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;

    timer = clock();

    for (int i = 0; i < 100000000; i++) {
        char c = a.at(i);
        if ('b'==c) b += "a";
    }

    cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;

    return 0;
}

Answer 1

使用Visual C ++ 2010，我获得了与上面评论相同的计时结果 - 平均而言，第二个循环占用了第一个循环的大约80％的运行时间。一到两次，第一个循环有点快，但这可能是由于OS中的一些线程打嗝。检查拆卸产生了以下内容：

第一循环：

01231120  cmp         dword ptr [ebp-38h],esi  
01231123  jbe         main+1CBh (123120Bh)  
01231129  cmp         dword ptr [ebp-34h],10h  
0123112D  mov         eax,dword ptr [ebp-48h]  
01231130  jae         main+0F5h (1231135h)  
01231132  lea         eax,[ebp-48h]  
01231135  cmp         byte ptr [eax+esi],62h  
01231139  jne         main+108h (1231148h)  
0123113B  mov         ebx,1  
01231140  lea         eax,[ebp-80h]  
01231143  call        std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h)  
01231148  inc         esi  
01231149  cmp         esi,5F5E100h  
0123114F  jl          main+0E0h (1231120h)

第二次循环：

01231155  cmp         dword ptr [ebp-1Ch],esi  
01231158  jbe         main+1CBh (123120Bh)  
0123115E  cmp         dword ptr [ebp-18h],10h  
01231162  mov         eax,dword ptr [ebp-2Ch]  
01231165  jae         main+12Ah (123116Ah)  
01231167  lea         eax,[ebp-2Ch]  
0123116A  cmp         byte ptr [eax+esi],62h  
0123116E  jne         main+13Dh (123117Dh)  
01231170  mov         ebx,1  
01231175  lea         eax,[ebp-64h]  
01231178  call        std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h)  
0123117D  inc         esi  
0123117E  cmp         esi,5F5E100h  
01231184  jl          main+115h (1231155h)

由于生成的程序集看起来或多或少相同，我想到了操作系统或CPU中的限制机制并猜测是什么？在两个循环之间添加 Sleep（5000）; 会导致第二个循环（几乎）总是慢比第一个循环慢。运行20次，第二次循环平均占用第一个运行时的150％左右。

编辑：将spincount增加五倍会得到相同的结果。我假设大约0.5s的运行时间或多或少可靠地测量。： - ）

在原始代码中，我认为，操作系统可能需要几次检测CPU负载，然后在调度期间开始给予线程更高的优先级，同时CPU可能会在一段时间后提升，使第一个循环的部分“未被提升” ”。当第二个循环开始执行时，OS / CPU可能会为繁重的工作负载做好准备并执行得更快一些。 MMU或OS内部存储器页面处理也会发生同样的情况。在循环之间添加Sleep时，可能会发生相反的情况，导致操作系统将线程搁置一段时间，直到最终检测到新的工作负载，从而使第二个循环的执行速度稍慢。

你有什么结果？有没有人有像英特尔放大器这样的合适的分析器来测量循环中的CPI速率和CPU速度？

两个代码段之间执行时间的奇怪差异

1 个答案: