notify_one()的性能真的这么糟糕吗?

时间:2018-02-22 20:26:21

标签: multithreading c++11 concurrency condition-variable

对于下面的测量,我一直在使用x86_64 GNU / Linux和内核4.4.0-109-通用#132-Ubuntu SMP在AMD FX(tm)-8150八核处理器上运行(有一个64字节缓存行大小)。

完整的源代码可以在这里获得:https://github.com/CarloWood/ai-threadsafe-testsuite/blob/master/src/condition_variable_test.cxx

独立于其他库。只需编译:

g++ -pthread -std=c++11 -O3 condition_variable_test.cxx

我在这里尝试做的是测量当一个或多个线程实际等待时执行对notify_one()的调用所需的时间,相对于没有线程在使用的condition_variable上等待多长时间

令我惊讶的是,我发现两种情况都在微秒范围内:当1个线程在等待时,它需要大约14到20微秒;当没有线程在等待时,它显然需要更少,但仍然至少1微秒。

换句话说,如果你有一个生产者/消费者场景,并且每次没有为消费者做任何事情,你就让他们调用wait(),并且每当生产者将新内容写入队列时你调用notify_one(),假设std :: condition_variable的实现足够聪明,不会花费很多时间,因为没有线程在第一时间等待..那么哦,恐怖,你的应用程序将变得比我写给TEST的代码当线程等待

时,对notify_one()的调用需要多长时间!

似乎我使用的代码是必须来加速这种情况。这让我感到困惑:为什么我写的代码已经是std :: condition_variable的一部分?

有问题的代码是,而不是:

// Producer thread:
add_something_to_queue();
cv.notify_one();

// Consumer thread:
if (queue.empty())
{
  std::unique_lock<std::mutex> lk(m);
  cv.wait(lk);
}

您可以通过以下方式获得1000倍的加速:

// Producer thread:
add_something_to_queue();
int waiting;
while ((waiting = s_idle.load(std::memory_order_relaxed)) > 0)
{
  if (!s_idle.compare_exchange_weak(waiting, waiting - 1, std::memory_order_relaxed, std::memory_order_relaxed))
    continue;
  std::unique_lock<std::mutex> lk(m);
  cv.notify_one();
  break;
}

// Consumer thread:
if (queue.empty())
{
  std::unique_lock<std::mutex> lk(m);
  s_idle.fetch_add(1, std::memory_order_relaxed);
  cv.wait(lk);
}

我在这里犯了一些可怕的错误吗?或者我的发现是否正确?

编辑:

我忘了添加基准测试程序的输出(DIRECT = 0):

All started!
Thread 1 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.039ns
The average time spend on calling notify_one() (726141 calls) was: 17995.5 - 21070.1 ns.
Thread 1 finished.
Thread Thread Thread 5 finished.
8 finished.
7 finished.
Thread 6 finished.
Thread 3 statistics: avg: 1.9ns, min: 1.7ns, max: 2.1ns, stddev: 0.088ns
The average time spend on calling notify_one() (726143 calls) was: 17207.3 - 22278.5 ns.
Thread 3 finished.
Thread 2 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.055ns
The average time spend on calling notify_one() (726143 calls) was: 17910.1 - 21626.5 ns.
Thread 2 finished.
Thread 4 statistics: avg: 1.9ns, min: 1.6ns, max: 2ns, stddev: 0.092ns
The average time spend on calling notify_one() (726143 calls) was: 17337.5 - 22567.8 ns.
Thread 4 finished.
All finished!

DIRECT = 1:

All started!
Thread 4 statistics: avg: 1.2e+03ns, min: 4.9e+02ns, max: 1.4e+03ns, stddev: 2.5e+02ns
The average time spend on calling notify_one() (0 calls) was: 1156.49 ns.
Thread 4 finished.
Thread 5 finished.
Thread 8 finished.
Thread 7 finished.
Thread 6 finished.
Thread 3 statistics: avg: 1.2e+03ns, min: 5.9e+02ns, max: 1.5e+03ns, stddev: 2.4e+02ns
The average time spend on calling notify_one() (0 calls) was: 1164.52 ns.
Thread 3 finished.
Thread 2 statistics: avg: 1.2e+03ns, min: 1.6e+02ns, max: 1.4e+03ns, stddev: 2.9e+02ns
The average time spend on calling notify_one() (0 calls) was: 1166.93 ns.
Thread 2 finished.
Thread 1 statistics: avg: 1.2e+03ns, min: 95ns, max: 1.4e+03ns, stddev: 3.2e+02ns
The average time spend on calling notify_one() (0 calls) was: 1167.81 ns.
Thread 1 finished.
All finished!

&#39; 0来电&#39;在后一个输出中实际上是大约20000000个电话。

0 个答案:

没有答案