Question

我有一个程序，其中一个简单的函数被调用很多次。我添加了一些简单的日志代码，发现这会显着影响性能，即使实际上没有调用日志代码也是如此。完整（但简化）的测试用例如下所示：

#include <chrono>
#include <iostream>
#include <random>
#include <sstream>

using namespace std::chrono;

std::mt19937 rng;

uint32_t getValue()
{
    // Just some pointless work, helps stop this function from getting inlined.
    for (int x = 0; x < 100; x++)
    {
        rng();
    }

    // Get a value, which happens never to be zero
    uint32_t value = rng();

    // This (by chance) is never true
    if (value == 0)
    {
        value++; // This if statment won't get optimized away when printing below is commented out.

        std::stringstream ss;
        ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
        std::cout << ss.str();
    }

    return value;
}

int main(int argc, char* argv[])
{
    // Just fror timing
    high_resolution_clock::time_point start = high_resolution_clock::now();

    uint32_t sum = 0;   
    for (uint32_t i = 0; i < 10000000; i++)
    {
        sum += getValue();  
    }

    milliseconds elapsed = duration_cast<milliseconds>(high_resolution_clock::now() - start);

    // Use (print) the sum to make sure it doesn't get optimized away.
    std::cout << "Sum  = " << sum << ", Elapsed = " << elapsed.count() << "ms" << std::endl;
    return 0;
}

请注意，代码包含stringstream和cout，但实际上从未调用过。但是，这三行代码的存在将运行时间从2.9秒增加到3.3秒。这是在VS2013上的释放模式。奇怪的是，如果我使用'-O3'标记构建GCC，那么额外的三行代码实际上将运行时减少半秒左右。

我理解额外的代码可能会以多种方式影响生成的可执行文件，例如阻止内联或导致更多缓存未命中。真正的问题是，我是否可以做些什么来改善这种情况？切换到sprintf（）/ printf（）似乎没有什么区别。我是否需要简单地接受将这样的日志记录代码添加到小函数中会影响性能，即使没有被调用？

注意：为了完整性，我的真实/完整场景是我使用包装器宏来抛出异常，我喜欢在抛出这样的异常时进行记录。因此，当我调用THROW_EXCEPT（...）时，它会插入类似于上面显示的代码然后抛出。当我从一个小函数内部抛出异常时，这会受到伤害。这里有更好的选择吗？

编辑：这是一个用于快速测试的VS2013解决方案，因此可以检查编译器设置：https://drive.google.com/file/d/0B7b4UnjhhIiEamFyS0hjSnVzbGM/view?usp=sharing

Answer 1

所以我最初认为这是由于分支预测和优化分支所以我看一下注释的程序集，当代码被注释掉时：

    if (value == 0)
00E21371  mov         ecx,1  
00E21376  cmove       eax,ecx  
    {
        value++;

在这里，我们看到编译器已经帮助优化了我们的分支，所以如果我们放入一个更复杂的语句来阻止它这样做：

if (value == 0)
00AE1371  jne         getValue+99h (0AE1379h)  
    {
        value /= value;
00AE1373  xor         edx,edx  
00AE1375  xor         ecx,ecx  
00AE1377  div         eax,ecx

此处分支处于保留状态，但在运行时，它的运行速度与上一个示例一样快，并注释掉以下行。因此，让我们看看组件中是否有这些行：

if (value == 0)
008F13A0  jne         getValue+20Bh (08F14EBh)  
    {
        value++;     
        std::stringstream ss;
008F13A6  lea         ecx,[ebp-58h]  
008F13A9  mov         dword ptr [ss],8F32B4h  
008F13B3  mov         dword ptr [ebp-0B0h],8F32F4h  
008F13BD  call        dword ptr ds:[8F30A4h]  
008F13C3  push        0  
008F13C5  lea         eax,[ebp-0A8h]  
008F13CB  mov         dword ptr [ebp-4],0  
008F13D2  push        eax  
008F13D3  lea         ecx,[ss]  
008F13D9  mov         dword ptr [ebp-10h],1  
008F13E0  call        dword ptr ds:[8F30A0h]  
008F13E6  mov         dword ptr [ebp-4],1  
008F13ED  mov         eax,dword ptr [ss]  
008F13F3  mov         eax,dword ptr [eax+4]  
008F13F6  mov         dword ptr ss[eax],8F32B0h  
008F1401  mov         eax,dword ptr [ss]  
008F1407  mov         ecx,dword ptr [eax+4]  
008F140A  lea         eax,[ecx-68h]  
008F140D  mov         dword ptr [ebp+ecx-0C4h],eax  
008F1414  lea         ecx,[ebp-0A8h]  
008F141A  call        dword ptr ds:[8F30B0h]  
008F1420  mov         dword ptr [ebp-4],0FFFFFFFFh

如果该分支被击中，那就是很多指令。那么如果我们尝试其他什么呢？

    if (value == 0)
011F1371  jne         getValue+0A6h (011F1386h)  
    {
        value++;
        printf("This never gets printed, but commenting out these three lines improves performance.");
011F1373  push        11F31D0h  
011F1378  call        dword ptr ds:[11F30ECh]  
011F137E  add         esp,4

这里我们的指令要少得多，而且它的运行速度和所有注释掉的行一样快。

所以我不确定我能否确切地说出这里发生了什么，但我觉得目前它是分支预测和CPU指令缓存未命中的组合。

为了解决这个问题，您可以将日志记录移动到如下函数中：

void log()
{
    std::stringstream ss;
    ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
    std::cout << ss.str();
}

和

if (value == 0)
{
    value++;
    log();

然后它以前所述的速度运行，所有这些指令都替换为单个call log (011C12E0h)。

添加stringstream / cout会损害性能，即使从未调用过代码也是如此

1 个答案: