在C ++ 11中使用静态变量是否会受到惩罚

时间:2014-01-31 19:36:34

标签: c++ multithreading performance c++11 static

在C ++ 11中,这个:

const std::vector<int>& f() {
    static const std::vector<int> x { 1, 2, 3 };
    return x;
}

是线程安全的。但是,由于这种额外的线程安全保证,在第一次(即初始化之后)调用此函数是否会有额外的惩罚?我想知道函数是否会慢于使用全局变量的函数,因为它必须获取一个互斥锁来检查它是否在每次被调用时被另一个线程初始化,或者其他东西。

2 个答案:

答案 0 :(得分:9)

"The best intution to be ever had is 'I should measure this.'"所以let's find out

#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <numeric>
#include <vector>

namespace {
class timer {
    using hrc = std::chrono::high_resolution_clock;
    hrc::time_point start;

    static hrc::time_point now() {
      // Prevent memory operations from reordering across the
      // time measurement. This is likely overkill, needs more
      // research to determine the correct fencing.
      std::atomic_thread_fence(std::memory_order_seq_cst);
      auto t = hrc::now();
      std::atomic_thread_fence(std::memory_order_seq_cst);
      return t;
    }

public:
    timer() : start(now()) {}

    hrc::duration elapsed() const {
      return now() - start;
    }

    template <typename Duration>
    typename Duration::rep elapsed() const {
      return std::chrono::duration_cast<Duration>(elapsed()).count();
    }

    template <typename Rep, typename Period>
    Rep elapsed() const {
      return elapsed<std::chrono::duration<Rep,Period>>();
    }
};

const std::vector<int>& f() {
    static const auto x = std::vector<int>{ 1, 2, 3 };
    return x;
}

static const auto y = std::vector<int>{ 1, 2, 3 };
const std::vector<int>& g() {
    return y;
}

const unsigned long long n_iterations = 500000000;

template <typename F>
void test_one(const char* name, F f) {
  f(); // First call outside the timer.

  using value_type = typename std::decay<decltype(f()[0])>::type;
  std::cout << name << ": " << std::flush;

  auto t = timer{};
  auto sum = uint64_t{};
  for (auto i = n_iterations; i > 0; --i) {
    const auto& vec = f();
    sum += std::accumulate(begin(vec), end(vec), value_type{});
  }
  const auto elapsed = t.elapsed<std::chrono::milliseconds>();
  std::cout << elapsed << " ms (" << sum << ")\n";
}
} // anonymous namespace

int main() {
  test_one("local static", f);
  test_one("global static", g);
}

在Coliru运行,本地版本在4618毫秒内完成5e8次迭代,全局版本在4392毫秒内完成。所以,是的,每次迭代本地版本慢了大约0.452纳秒。虽然存在可测量的差异,但在大多数情况下,它太小而不能影响观察到的性能。

<小时/> 编辑:有趣的对位,switching from clang++ to g++ changes the result ordering。 g ++编译的二进制文件运行4418 ms(全局)与4181 ms(本地),因此 local 每次迭代更快474皮秒。尽管如此,它仍然重申了这两种方法之间的差异很小的结论。
编辑2:检查生成的程序集,我决定从函数指针转换为函数对象以更好地内联。通过函数指针间接调用的时序并不是OP中代码的真正特征。所以我使用了这个程序:

#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <numeric>
#include <vector>

namespace {
class timer {
    using hrc = std::chrono::high_resolution_clock;
    hrc::time_point start;

    static hrc::time_point now() {
      // Prevent memory operations from reordering across the
      // time measurement. This is likely overkill.
      std::atomic_thread_fence(std::memory_order_seq_cst);
      auto t = hrc::now();
      std::atomic_thread_fence(std::memory_order_seq_cst);
      return t;
    }

public:
    timer() : start(now()) {}

    hrc::duration elapsed() const {
      return now() - start;
    }

    template <typename Duration>
    typename Duration::rep elapsed() const {
      return std::chrono::duration_cast<Duration>(elapsed()).count();
    }

    template <typename Rep, typename Period>
    Rep elapsed() const {
      return elapsed<std::chrono::duration<Rep,Period>>();
    }
};

class f {
public:
    const std::vector<int>& operator()() {
        static const auto x = std::vector<int>{ 1, 2, 3 };
        return x;
    }
};

class g {
    static const std::vector<int> x;
public:
    const std::vector<int>& operator()() {
        return x;
    }
};

const std::vector<int> g::x{ 1, 2, 3 };

const unsigned long long n_iterations = 500000000;

template <typename F>
void test_one(const char* name, F f) {
  f(); // First call outside the timer.

  using value_type = typename std::decay<decltype(f()[0])>::type;
  std::cout << name << ": " << std::flush;

  auto t = timer{};
  auto sum = uint64_t{};
  for (auto i = n_iterations; i > 0; --i) {
    const auto& vec = f();
    sum += std::accumulate(begin(vec), end(vec), value_type{});
  }
  const auto elapsed = t.elapsed<std::chrono::milliseconds>();
  std::cout << elapsed << " ms (" << sum << ")\n";
}
} // anonymous namespace

int main() {
  test_one("local static", f());
  test_one("global static", g());
}

毫不奇怪,g++ (3803ms local, 2323ms global)clang (4183ms local, 3253ms global)下的运行时间更快。结果证实了我们的直觉,即全局技术应该比本地技术更快,每次迭代的增量为2.96纳秒(g ++)和1.86纳秒(clang)。

答案 1 :(得分:5)

是的,检查对象是否已初始化会有成本。这通常会测试一个原子布尔变量,而不是锁定互斥锁。