我读到基于范围的循环在某些编程语言上具有更好的性能。在C ++中是这样吗? 例如;
int main()
{
vector<int> v = {1, 2, 3, 4, 5};
auto size = v.size();
// LOOP1
for (int i = 0; i < size; i++) {
// do something with v[i]
}
// LOOP2
for (int& val : v) {
// do something with val
}
return 0;
}
当LOOP2
尺寸巨大时,LOOP1
的效果是否优于vector
?如果是这样,为什么?
答案 0 :(得分:3)
这是一个粗略的测试。我并不是说这是一个明确的答案,哪个更快,但在我看来,在这种特殊情况下,gcc编译器能够将两个循环优化到大致相同的性能水平。如果您愿意,您肯定可以改进测试方法。
在我的系统上(Ubuntu 14.04,某种i7,8 GB DDR3,gcc):
没有优化(g ++ main.cpp -std = c ++ 11):
老式循环:5.45131秒。
基于范围的循环:9.90306秒。
通过优化(g ++ main.cpp -O3 -std = c ++ 11):
老式循环:0.469001秒。
基于范围的循环:0.467045秒。
#include <iostream>
#include <vector>
#include <time.h>
using namespace std;
double time_elapsed(timespec& start, timespec& end)
{
return ((1e9 * end.tv_sec + end.tv_nsec) -
(1e9 * start.tv_sec + start.tv_nsec)) / 1.0e9;
}
int main()
{
vector<int> v(1e9, 42);
timespec start, end;
// Old-fashioned loop.
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
size_t size = v.size();
for (size_t i = 0; i < size; i++)
{
v[i] *= v[i];
}
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
cout << "Old-fashioned loop: " << time_elapsed(start, end) << " seconds\n";
// Range-based loop.
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
for (int& val : v)
{
val *= val;
}
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
cout << "Range-based loop: " << time_elapsed(start, end) << " seconds.\n";
}
答案 1 :(得分:0)
我使用以下代码测试了它们:
#include <benchmark/benchmark.h>
#include <vector>
auto get_vector(int size) {
return std::vector<int>(size, 2333);
}
template <class F>
void BM_for_range(benchmark::State &state)
{
for(auto _: state) {
state.PauseTiming();
F{}(get_vector(state.range(0)), state);//Run the test
}
}
struct v1 {
template <class T>
void operator () (std::vector<T> v, benchmark::State &state) {
state.ResumeTiming();
auto size = v.size();
decltype(size) i = 0;
for (; i != size; i++) {
v[i] *= v[i];
}
}
};
struct v2 {
template <class T>
void operator () (std::vector<T> v, benchmark::State &state) {
state.ResumeTiming();
for (int& val : v) {
val *= val;
}
}
};
BENCHMARK_TEMPLATE(BM_for_range, v1)->Range(64, (1 << 30));
BENCHMARK_TEMPLATE(BM_for_range, v2)->Range(64, (1 << 30));
BENCHMARK_MAIN();
所有性能测试均在:
完成Debian 9.4, Linux version 4.9.0-6-amd64 (debian-kernel@lists.debian.org)(gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02)
使用clang++-6.0 -std=c++17 -lbenchmark -lpthread -Ofast main.cc
在bash中运行这些结果得到了结果:
sudo cpupower frequency-set --governor performance
./a.out
结果:
Run on (8 X 1600 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 6144K (x1)
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
BM_for_range<v1>/64 929 ns 932 ns 763881
BM_for_range<v1>/512 1051 ns 1053 ns 662887
BM_for_range<v1>/4096 2195 ns 2199 ns 317788
BM_for_range<v1>/32768 12484 ns 12509 ns 57699
BM_for_range<v1>/262144 114788 ns 114784 ns 6277
BM_for_range<v1>/2097152 1270037 ns 1269506 ns 554
BM_for_range<v1>/16777216 16472508 ns 16468972 ns 43
BM_for_range<v1>/134217728 130165013 ns 130136049 ns 5
BM_for_range<v1>/1073741824 986169581 ns 986168129 ns 1
BM_for_range<v2>/64 925 ns 927 ns 730624
BM_for_range<v2>/512 1060 ns 1062 ns 654477
BM_for_range<v2>/4096 2197 ns 2201 ns 319264
BM_for_range<v2>/32768 12288 ns 12314 ns 53352
BM_for_range<v2>/262144 112196 ns 112191 ns 6220
BM_for_range<v2>/2097152 1268441 ns 1267732 ns 553
BM_for_range<v2>/16777216 16627461 ns 16623624 ns 42
BM_for_range<v2>/134217728 127552348 ns 127551465 ns 6
BM_for_range<v2>/1073741824 963751205 ns 963152928 ns 1