Question

我正在尝试测量点积的执行时间，但我发现差异取决于用于存储最终结果的变量，即，当使用整数时结果是0ms但是当使用数组元素时时间要高得多。

与编译器有关，使用整数变量时，是否能够执行循环的矢量化？

这是我的代码

#include <stdio.h>
#include <iostream>
#include <time.h> 

using namespace std;

void main(int argc, char* argv[])
{
    int* a = new int[2000000000];
    for (unsigned long long i = 0; i < 2000000000; i++)
        a[i] = 1;

    clock_t t1 = clock();
    int nResult = 0;
    for (unsigned long long i = 0; i < 2000000000; i++)
        nResult += a[i] * a[i];
    clock_t t2 = clock();
    cout << "Execution time = " << (int)(1000 * ((t2 - t1) / (double)CLOCKS_PER_SEC)) << " ms" << endl;

    t1 = clock();
    int b[1] = {0};
    for (unsigned long long i = 0; i < 2000000000; i++)
        b[0] += a[i] * a[i];
    t2 = clock();
    cout << "Execution time = " << (int)(1000 * ((t2 - t1) / (double)CLOCKS_PER_SEC)) << " ms" << endl;

    delete[] a;

    getchar();

    return;
}

这是输出

Execution time = 0 ms
Execution time = 702 ms

提前感谢您的帮助

Answer 1

我认为这甚至与矢量化无关。即使使用SIMD指令，也不会达到平坦的0ms。

然而，似乎正在发生的事情是完全删除了循环。您永远不会使用nResult的值，即使您这样做了，优化器也能够猜出该值是什么，并在编译时将其简单地放入变量中。

基准测试是一个反直觉的主题，您需要禁用某些编译器优化来实际测量某些内容，同时仍然对正常程序中存在的优化代码进行基准测试。

您可能希望看一下这篇演讲，它非常擅长解释如何正确地对代码进行基准测试：https://youtu.be/nXaxk27zwlk

执行时间的差异

1 个答案: