Question

我对for(;;)和for(:)之间的区别感到好奇，特别是两者之间的速度。所以我通过一个1000万个整数的向量并将它们全部加在一起来进行一些测试。我发现for(:)慢了1.3。

什么会导致for(:)慢得多！？

编辑：似乎for（:)使用向量的迭代器不像for（;;）让它更长。

/ Yu“stdafx.h”/ GS / analyze- / W3 / Zc：wchar_t / ZI / Gm / Od / sdl /Fd"Debug\vc120.pdb“/ fp：precise / D”WIN32“/ D” _DEBUG“/ D”_CONSOLE“/ D”_LIB“/ D”_UNICODE“/ D”UNICODE“/ errorReport：prompt / WX- / Zc：forScope / RTC1 / Gd / Oy- / MDd / Fa”Debug \“/ EHsc / nologo / Fo“Debug \”/Fp"Debug\forvsForLoop.pch“

#include "stdafx.h"
#include <vector>
#include <iostream>
#include <chrono>

void init(std::vector<int> &array){
    srand(20);
    for (int x = 0; x < 10000000; x++)
        array.push_back(rand());
    return;
}

unsigned long testForLoop(std::vector<int> &array){
    unsigned long result = 0;
    for (int x = 0; x < array.size(); x++)
        result += array[x];
    return result;
}
unsigned long testFor(std::vector<int> &array){
    unsigned long result = 0;
    for (const int &element : array)
        result += element;
    return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
    std::vector<int> testingArray;

    init(testingArray);

    //Warm up
    std::cout << "warming up \n";
    testForLoop(testingArray);
    testFor(testingArray);
    testForLoop(testingArray);
    testFor(testingArray);
    testForLoop(testingArray);
    testFor(testingArray);
    std::cout << "starting \n";

    auto start = std::chrono::high_resolution_clock::now();
    testForLoop(testingArray);
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "ForLoop took: " <<  std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;


    start = std::chrono::high_resolution_clock::now();
    testFor(testingArray);
    end = std::chrono::high_resolution_clock::now();
    std::cout << "For---- took: " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;

    system("pause");
    return 0;

}

Answer 1

如果您正在使用：

for ( auto x : ... )

然后每个x都是副本。可以减少开销：

for ( const auto & x : ... )

Answer 2

该标准没有说明性能或实施。两个循环都应该正常工作，并且在正常情况下性能应该相等。没有人可以说为什么它在MSVC ++中太慢，除非他声称这是一个错误或不好的实现。也许您应该正确更改优化设置。

我已在MSVC++，GCC和Clang中测试了您的代码。

GCC输出

ForLoop took: 7879773
For---- took: 5786831

Clang输出

ForLoop took: 6537441
For---- took: 6743614

和MSVC ++输出

ForLoop took: 77786200
For---- took: 249612200

GCC和Clang都有合理的结果，两个循环彼此接近，如预期的那样。但MSVC ++的结果含糊不清，不切实际。我把它称为错误或回归。或者，编译错误的配置，尝试其他优化设置。

Answer 3

为确保测试未经过优化，我打印出结果：

 auto x = testForLoop(......

 // ^^^
 ......nd - start).count() << "  R: " << x << std::endl;

                          //  ^^^^^^^^^^^^^^^^

正常模式:(约半速）

> g++ -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 33262788  R: 10739647121123056
For---- took: 51263111   R: 10739647121123056

优化:(几乎完全相同）

> g++ -O3 -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 4861314  R: 10739647121123056
For---- took: 4997957   R: 10739647121123056

Answer 4

答案是猜测，对所使用的确切代码和优化具有主观性。底层平台也可以改变代码行为的工作方式。

管理迭代有两种“低级”方法：一种是基于“可重新指定的指针”，另一种是基于“常量指针和偏移量”。

在伪代码中

loop { *a = *b; ++a; ++b; }

VS

loop { a[i] = b[i]; ++i; }

根据处理器体系结构的不同，这两者在使用寄存器，地址局部性和高速缓存方面具有不同的行为是不同的：第一个具有两个存储保持常量的和，第二个具有两个和寄存器和寄存器增量。（并且都有内存副本）

在x86平台上，第二个更好，因为内存访问较少，并且使用需要较少内存提取的指令。

现在，应用于向量（其迭代器包装指针）的基于迭代器的循环导致第一种形式，而传统的基于索引的循环导致第二种形式。

现在for(a: v) { .... }与for(auto i=v.begin(); i!=v.end(); ++i) { auto& a=*i; ... }

相同

它适用于任何形式的容器（也不是内存顺序），但不能简化为基于索引。除非编译器优化非常好，否则迭代器实际上是一个以恒定增量移动的指针。

为什么循环之间的巨大差异

4 个答案: