对于下面的测量,我一直在使用x86_64 GNU / Linux和内核4.4.0-109-通用#132-Ubuntu SMP在AMD FX(tm)-8150八核处理器上运行(有一个64字节缓存行大小)。
完整的源代码可以在这里获得:https://github.com/CarloWood/ai-threadsafe-testsuite/blob/master/src/condition_variable_test.cxx
独立于其他库。只需编译:
g++ -pthread -std=c++11 -O3 condition_variable_test.cxx
我在这里尝试做的是测量当一个或多个线程实际等待时执行对notify_one()
的调用所需的时间,相对于没有线程在使用的condition_variable上等待多长时间
令我惊讶的是,我发现两种情况都在微秒范围内:当1个线程在等待时,它需要大约14到20微秒;当没有线程在等待时,它显然需要更少,但仍然至少1微秒。
换句话说,如果你有一个生产者/消费者场景,并且每次没有为消费者做任何事情,你就让他们调用wait()
,并且每当生产者将新内容写入队列时你调用notify_one()
,假设std :: condition_variable的实现足够聪明,不会花费很多时间,因为没有线程在第一时间等待..那么哦,恐怖,你的应用程序将变得比我写给TEST的代码当线程等待
notify_one()
的调用需要多长时间!
似乎我使用的代码是必须来加速这种情况。这让我感到困惑:为什么我写的代码已经是std :: condition_variable的一部分?
有问题的代码是,而不是:
// Producer thread:
add_something_to_queue();
cv.notify_one();
// Consumer thread:
if (queue.empty())
{
std::unique_lock<std::mutex> lk(m);
cv.wait(lk);
}
您可以通过以下方式获得1000倍的加速:
// Producer thread:
add_something_to_queue();
int waiting;
while ((waiting = s_idle.load(std::memory_order_relaxed)) > 0)
{
if (!s_idle.compare_exchange_weak(waiting, waiting - 1, std::memory_order_relaxed, std::memory_order_relaxed))
continue;
std::unique_lock<std::mutex> lk(m);
cv.notify_one();
break;
}
// Consumer thread:
if (queue.empty())
{
std::unique_lock<std::mutex> lk(m);
s_idle.fetch_add(1, std::memory_order_relaxed);
cv.wait(lk);
}
我在这里犯了一些可怕的错误吗?或者我的发现是否正确?
编辑:
我忘了添加基准测试程序的输出(DIRECT = 0):
All started!
Thread 1 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.039ns
The average time spend on calling notify_one() (726141 calls) was: 17995.5 - 21070.1 ns.
Thread 1 finished.
Thread Thread Thread 5 finished.
8 finished.
7 finished.
Thread 6 finished.
Thread 3 statistics: avg: 1.9ns, min: 1.7ns, max: 2.1ns, stddev: 0.088ns
The average time spend on calling notify_one() (726143 calls) was: 17207.3 - 22278.5 ns.
Thread 3 finished.
Thread 2 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.055ns
The average time spend on calling notify_one() (726143 calls) was: 17910.1 - 21626.5 ns.
Thread 2 finished.
Thread 4 statistics: avg: 1.9ns, min: 1.6ns, max: 2ns, stddev: 0.092ns
The average time spend on calling notify_one() (726143 calls) was: 17337.5 - 22567.8 ns.
Thread 4 finished.
All finished!
DIRECT = 1:
All started!
Thread 4 statistics: avg: 1.2e+03ns, min: 4.9e+02ns, max: 1.4e+03ns, stddev: 2.5e+02ns
The average time spend on calling notify_one() (0 calls) was: 1156.49 ns.
Thread 4 finished.
Thread 5 finished.
Thread 8 finished.
Thread 7 finished.
Thread 6 finished.
Thread 3 statistics: avg: 1.2e+03ns, min: 5.9e+02ns, max: 1.5e+03ns, stddev: 2.4e+02ns
The average time spend on calling notify_one() (0 calls) was: 1164.52 ns.
Thread 3 finished.
Thread 2 statistics: avg: 1.2e+03ns, min: 1.6e+02ns, max: 1.4e+03ns, stddev: 2.9e+02ns
The average time spend on calling notify_one() (0 calls) was: 1166.93 ns.
Thread 2 finished.
Thread 1 statistics: avg: 1.2e+03ns, min: 95ns, max: 1.4e+03ns, stddev: 3.2e+02ns
The average time spend on calling notify_one() (0 calls) was: 1167.81 ns.
Thread 1 finished.
All finished!
&#39; 0来电&#39;在后一个输出中实际上是大约20000000个电话。