Question

对于下面的测量，我一直在使用x86_64 GNU / Linux和内核4.4.0-109-通用＃132-Ubuntu SMP在AMD FX（tm）-8150八核处理器上运行（有一个64字节缓存行大小）。

完整的源代码可以在这里获得：https://github.com/CarloWood/ai-threadsafe-testsuite/blob/master/src/condition_variable_test.cxx

独立于其他库。只需编译：

g++ -pthread -std=c++11 -O3 condition_variable_test.cxx

我在这里尝试做的是测量当一个或多个线程实际等待时执行对notify_one()的调用所需的时间，相对于没有线程在使用的condition_variable上等待多长时间

令我惊讶的是，我发现两种情况都在微秒范围内：当1个线程在等待时，它需要大约14到20微秒;当没有线程在等待时，它显然需要更少，但仍然至少1微秒。

换句话说，如果你有一个生产者/消费者场景，并且每次没有为消费者做任何事情，你就让他们调用wait()，并且每当生产者将新内容写入队列时你调用notify_one()，假设std :: condition_variable的实现足够聪明，不会花费很多时间，因为没有线程在第一时间等待..那么哦，恐怖，你的应用程序将变得比我写给TEST的代码当线程等待

时，对notify_one()的调用需要多长时间！

似乎我使用的代码是必须来加速这种情况。这让我感到困惑：为什么我写的代码已经是std :: condition_variable的一部分？

有问题的代码是，而不是：

// Producer thread: add_something_to_queue(); cv.notify_one(); // Consumer thread: if (queue.empty()) { std::unique_lock<std::mutex> lk(m); cv.wait(lk); }

您可以通过以下方式获得1000倍的加速：

// Producer thread: add_something_to_queue(); int waiting; while ((waiting = s_idle.load(std::memory_order_relaxed)) > 0) { if (!s_idle.compare_exchange_weak(waiting, waiting - 1, std::memory_order_relaxed, std::memory_order_relaxed)) continue; std::unique_lock<std::mutex> lk(m); cv.notify_one(); break; } // Consumer thread: if (queue.empty()) { std::unique_lock<std::mutex> lk(m); s_idle.fetch_add(1, std::memory_order_relaxed); cv.wait(lk); }

我在这里犯了一些可怕的错误吗？或者我的发现是否正确？

编辑：

我忘了添加基准测试程序的输出（DIRECT = 0）：

All started! Thread 1 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.039ns The average time spend on calling notify_one() (726141 calls) was: 17995.5 - 21070.1 ns. Thread 1 finished. Thread Thread Thread 5 finished. 8 finished. 7 finished. Thread 6 finished. Thread 3 statistics: avg: 1.9ns, min: 1.7ns, max: 2.1ns, stddev: 0.088ns The average time spend on calling notify_one() (726143 calls) was: 17207.3 - 22278.5 ns. Thread 3 finished. Thread 2 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.055ns The average time spend on calling notify_one() (726143 calls) was: 17910.1 - 21626.5 ns. Thread 2 finished. Thread 4 statistics: avg: 1.9ns, min: 1.6ns, max: 2ns, stddev: 0.092ns The average time spend on calling notify_one() (726143 calls) was: 17337.5 - 22567.8 ns. Thread 4 finished. All finished!

DIRECT = 1：

All started! Thread 4 statistics: avg: 1.2e+03ns, min: 4.9e+02ns, max: 1.4e+03ns, stddev: 2.5e+02ns The average time spend on calling notify_one() (0 calls) was: 1156.49 ns. Thread 4 finished. Thread 5 finished. Thread 8 finished. Thread 7 finished. Thread 6 finished. Thread 3 statistics: avg: 1.2e+03ns, min: 5.9e+02ns, max: 1.5e+03ns, stddev: 2.4e+02ns The average time spend on calling notify_one() (0 calls) was: 1164.52 ns. Thread 3 finished. Thread 2 statistics: avg: 1.2e+03ns, min: 1.6e+02ns, max: 1.4e+03ns, stddev: 2.9e+02ns The average time spend on calling notify_one() (0 calls) was: 1166.93 ns. Thread 2 finished. Thread 1 statistics: avg: 1.2e+03ns, min: 95ns, max: 1.4e+03ns, stddev: 3.2e+02ns The average time spend on calling notify_one() (0 calls) was: 1167.81 ns. Thread 1 finished. All finished!

＆＃39; 0来电＆＃39;在后一个输出中实际上是大约20000000个电话。

notify_one（）的性能真的这么糟糕吗？

0 个答案: