原子内存排序性能差异

时间:2014-01-23 03:27:22

标签: multithreading performance c++11 atomic

我写了一个小测试来检查原子负载与不同内存排序的性能差异,我发现轻松和顺序一致的内存排序的性能是相同的。它是否仅仅因为次优的编译器实现而发生,或者这是我在x86处理器上可以预期的结果?我使用编译器gcc(GCC)4.4.7 20120313(Red Hat 4.4.7-3)。我用优化-O2编译了我的测试(这就是为什么第二次使用简单变量的测试显示零执行时间)。

Results:
Start volatile tests with 1000000000 iterations
volatile test took 689438 microseconds. Last value of local var is 1
Start simple var tests with 1000000000 iterations
simple var test took 0 microseconds. Last value of local var is 2
Start relaxed atomic tests with 1000000000 iterations
relaxed atomic test took 25655002 microseconds. Last value of local var is 3
Start sequentially consistent atomic tests with 1000000000 iterations
sequentially consistent atomic test took 24844000 microseconds. Last value of local var is 4

这是测试功能:

std::atomic<int> atomic_var;
void relaxed_atomic_test(const unsigned iterations)
{
    cout << "Start relaxed atomic tests with " << iterations << " iterations" << endl;
    const microseconds start(std::chrono::system_clock::now().time_since_epoch());
    int local_var = 0;
    for(unsigned counter = 0; iterations != counter; ++counter)
    {
        local_var = atomic_var.load(memory_order_relaxed);
    }
    const microseconds end(std::chrono::system_clock::now().time_since_epoch());
    cout << "relaxed atomic test took " << (end - start).count()
         << " microseconds. Last value of local var is " << local_var << endl;
}

void sequentially_consistent_atomic_test(const unsigned iterations)
{
    cout << "Start sequentially consistent atomic tests with "
         << iterations << " iterations" << endl;
    const microseconds start(std::chrono::system_clock::now().time_since_epoch());
    int local_var = 0;
    for(unsigned counter = 0; iterations != counter; ++counter)
    {
        local_var = atomic_var.load(memory_order_seq_cst);
    }
    const microseconds end(std::chrono::system_clock::now().time_since_epoch());
    cout << "sequentially consistent atomic test took " << (end - start).count()
         << " microseconds. Last value of local var is " << local_var << endl;
}

更新: 我尝试了相同的测试,但改为阅读我使用写入原子变量。结果完全不同 - 写入memory_order_relaxed atomic与写入volatile的时间相同:

Start volatile tests with 1000000000 iterations
volatile test took 764088 microseconds. Last volatile_var value 999999999
Start simple var tests with 1000000000 iterations
simple var test took 0 microseconds. Last var value999999999
Start relaxed atomic tests with 1000000000 iterations
relaxed atomic test took 763968 microseconds. Last atomic_var value 999999999
Start sequentially consistent atomic tests with 1000000000 iterations
sequentially consistent atomic test took 15287267 microseconds. Last atomic_var value 999999999

所以我可以得出结论,在单线程原子中,轻松的内存排序对于存储操作表现为易失性,对于加载操作(使用此处理器和编译器)具有顺序一致的内存排序原子

1 个答案:

答案 0 :(得分:2)

x86是一个相对严格的内存架构,因此您可能会看到两者之间的相似性能。你会发现一个架构有更大的不同,它允许像POWER一样重新排序。