Question

pthread_cond_timedwait函数的

The POSIX documentation（IEEE 1003.1,2013）说：

重要的是要注意，当pthread_cond_wait（）和pthread_cond_timedwait（）没有错误地返回时，关联的谓词可能仍然是false。类似地，当pthread_cond_timedwait（）返回超时错误时，关联谓词可能为真，因为超时到期和谓词状态更改之间不可避免的竞争。

（强调我的）

我们都知道应该在while循环中检查由条件变量控制的谓词的故事，并且可能存在虚假的唤醒。但我的问题是关于不可避免的这个词 - 这是一个强有力的词。为什么这样的比赛无法避免？

请注意，如果这样的比赛不存在，我们可以检查pthread_cond_timedwait是否超时;而不是再次检查谓词，然后才处理超时条件。（当然，假设我们只用保持互斥的方式发出信号1）和2）当谓词实际发生变化时。）

如果我们被超时唤醒或被发出信号，那么持有“用户互斥锁”进行原子检查是否足够？

例如，让我们考虑在POSIX之上构建的条件变量的实现。（省略错误处理和初始化，可以填补明显的空白）。

class CV 
{
pthread_mutex_t mtx;
pthread_cond_t cv;
int waiters; // how many threads are sleeping
int wakeups; // how many times this cv got signalled

public:    
CV();
~CV();

// returns false if it timed out, true otherwise
bool wait(Mutex *userMutex, struct timespec *timeout)
{
    pthread_mutex_lock(&mtx);

    waiters++;
    const int oldWakeups = wakeups;

    userMutex->unlock();

    int ret; // 0 on success, non-0 on timeout

    for (;;) {
        ret = pthread_cond_timedwait(&mtx, &cv, timeout);
        if (!(ret == 0 && wakeups == 0))
            break; // not spurious
    }

    if (ret == 0) // not timed out
        wakeups--;

    pthread_mutex_unlock(&mtx);

    userMutex->lock();

    pthread_mutex_lock(&mtx);
    waiters--;
    if (ret != 0 && wakeups > oldWakeups) {
        // got a wakeup after a timeout: report the wake instead
        ret = 0;
        wakeups--;    
    }
    pthread_mutex_unlock(&mtx);

    return (ret == 0);
}

void wake()
{
    pthread_mutex_lock(&mtx);
    wakeups = min(wakeups + 1, waiters);
    pthread_cond_signal(&cv);
    pthread_mutex_unlock(&mtx);
}
};

可以显示

如果CV::wait报告超时，那么我们不发出信号，因此谓词没有改变;那
如果超时到期但我们在返回用户代码并保持用户互斥之前发出信号，那么我们会报告唤醒。

上面的代码是否包含一些严重的错误？如果不是，说比赛是不可避免的是标准错误，还是必须做一些我错过的其他假设？

Answer 1

首先，请注意，这通常是一个危险的部分：

pthread_mutex_unlock(&mtx);
// Trouble is here
userMutex->lock();

pthread_mutex_lock(&mtx);

在评论点，任何事情都可能发生。你没有锁。条件变量的强大之处在于它们始终持有锁或等待。

然后就是手头的问题，不可避免的比赛

if (ret != 0 && wakeups > oldWakeups) {
    // got a wakeup after a timeout: report the wake instead
    ret = 0;
    wakeups--;    
}

无法保证一堆pthread_cond_t等待的命令会被唤醒，这会对你的计数造成严重破坏

Thread1           Thread2        Thread3
{lock userMtx in calling code}
{lock mtx}
waiters++ (=1)
oldWakeups = 0
{unlock userMtx }
wait {unlock mtx}
                  {lock userMtx in calling code}
                  {lock mtx}
                  signal_all
                  wakeups = 1
                  {unlock mtx}
                  {unlock userMtx in calling code}
timeout(unavoid. racecase) {lock mtx}
{unlock mtx}
                                  {lock userMtx in calling code}
                                  {lock mtx}
                                  waiters++ (=2)
                                  oldWawkupes = 1
                                  {unlock userMtx }
                                  wait {unlock mtx}

                                  timeout {lock mtx}
                                  {unlock mtx}
                                  {lock userMtx}
                                  {lock mtx}
                                  waiters-- (=1)
                                  wakeups-- (=0)*
                                  {unlock mtx}
                                  {unlock userMtx in calling code}
 {lock userMtx}
 {lock mtx}
 waiters--(=0)
 wakeups == oldWakeups (=0)
 {unlock mtx}
 {unlock userMtx in calling code}

此时，在线程1上，oldWakeups = wakeups，因此检查不可避免的比赛案例未能注意到比赛案例，重新创建不可避免的比赛案例。这是由于线程3窃取了针对thread1的信号，使线程3（真正的超时）看起来像一个信号，而thread1（一个竞争信号/超时）看起来像一个超时

Answer 2

当线程发出信号时，您的实现不会阻止虚假TIMEOUT的可能性。你在cond_wait成功时立即减少唤醒，如果看起来有一个信号给你（唤醒的数字更高），你就会在失败的cond_wait上减少唤醒。但是，用于确保信号的数学意味着某人实际上并不这样做。

问题出在竞争情况下，您在等待

后解锁所有互斥锁

if (ret == 0)
    wakeups--;

pthread_mutex_unlock(&mtx);

// no locks held.  If interrupted, ANYTHING can happen

userMutex->lock();

pthread_mutex_lock(&mtx);

现在要定义成功和失败，我必须声明你的cond_wait从最初的pthread_mutex_lock跨越到最后的pthread_mutex_unlock。要声明您没有信号看起来像超时的竞争情况，必须如此。如果你设法阻止pthread_cond_wait上的激烈超时，只引入你自己的另一个激烈的超时，没有问题解决

所以必须要证明的是，有一种情况是线程在运行时发出信号，但唤醒检查失败。事实证明，最简单的方法是通过让一个线程窃取另一个线程来唤醒唤醒。 3个线程将等待，一个将发出两次信号。要做的诀窍是在Wake中滥用min（）。它还依赖于两个cond_waits之间的竞争案例。其中一个必须获得mtx，并且未定义哪一个成功。在这种情况下，我假设最坏的情况（你可以随时使用赛事案例证明）

initial state {
   waiters = 0
   wakeups = 0
}

Thread 1     Thread 2    Thread 3      Thread 4
1: {acquire userMutex}
1: wait(...) {
1:   {acquire mtx}
1:   {release userMutex}
1:   waiters++; // = 1
1:   oldWakeups = wakeups; // 0
1:   pthread_cond_wait // releases mtx
1:   ptrheads TIMES OUT // acquires mtx
1:   sees timeout
1:   {release mtx}
1:   // world's worst context switch occurs here
             2: {acquire userMutex}
             2: wait(...) {
             2:   {acquire mtx}
             2:   {release userMutex}
             2:   waiters++; // = 2
             2:   oldWakeups = wakeups; // = 0
             2:   pthread_cond_wait // releases mtx
                         3:  {acquires userMutex}
                         3:  wait(...) {
                         3:    {acquire mtx}
                         3:    {release userMutex}
                         3:    waiters++; // = 3
                         3:    oldWakeups = wakeups; // = 0
                         3:    pthread_cond_wait // releases mtx
                                       4:  {acquire userMtx}
                                       4:  wake() {
                                       4:    {acquire mtx}
                                       4:    wakeups = min(wakeups + 1, waiters);
                                       4:    //      = min(0 + 1, 3) = 1
                                       4:    pthread_cond_signal
                                       4:    {release mtx}
                                       4:  }
                                       4:  {release userMtx}
 RACE:       2: TIMEOUT  3: SIGNALED
 RACE:       both of these threads need to acquire mtx
             2:   {acquires mtx}
             2:   sees that it times out
             2:   if (timeout && (wakeups > oldWakeups)) { // (1 > 0)
             2:     // thinks the wakeup was for this thread
             2:     waiters--; // = 2
             2:     wakeups--; // = 0
             2:   }
             2:   {releases mtx}
             2:   returns SIGNALED;
             2: }
             2: {releases userMtx}
                         3:    {acquires mtx}
                         3:    sees that it was signaled
                         3:    wakeups--; // = -1 ... UH O!
                         3:    waiters--; // = 1
                         3:    {releases mtx}
                         3:    returns SIGNALED;
                         3:  }
                         3:  {releases userMtx}

 --- some synchronization which makes it clear that both thread 2 ---
 --- and thread 3 were signaled occurs here.  Thread 1 is still   ---
 --- technically waiting in limbo.  User decides to wake it up.   ---

                                       4:  {acquire userMtx}
                                       4:  wake() {
                                       4:    {acquire mtx}
                                       4:    wakeups = min(wakeups + 1, waiters);
                                       4:    //      = min(-1 + 1, 1) = 0  !!!
                                       4:    pthread_cond_signal
                                       4:    {release mtx}
                                       4:  }
                                       4:  {release userMtx}
1:   {acquire userMtx}
1:   {acquire mtx}
1:   waiters--; // = 0
1:   if (timeout && (wakeups > oldWakeups)) {..}  (0 > 0)
1:   // no signal detected
1:   {release mtx}
1:   return TIMEOUT;
1: }
1: {release userMtx}

由于一个有趣的比赛案例设法将唤醒发送到-1，避免丢失信号的技巧不起作用。允许pthreads_cond_signal唤醒多个线程，因此同时唤醒线程2和3是合法的。但是，第二个信号显然只有一个线程要发信号，因此必须发信号通知thread1。然而，我们返回了TIMEOUT，产生了臭名昭着的不可避免的种族案例。

据我所知，你试图将这些唤醒锁定到正确的线程越难，丢弃所有互斥锁的方式越多，而技术上不等待任何条件变量就越致命。

Answer 3

仅供参考，同一主题的一个有趣的条目：

http://woboq.com/blog/qwaitcondition-solving-unavoidable-race.html

解决这个问题的唯一方法是，如果我们可以在他们开始等待时订购线程。

受比特币区块链的启发，我在线程堆栈上创建了一个代表订单的链接节点列表。当一个线程开始等待时，它会在双链表的末尾添加自己。当线程唤醒其他线程时，它标记链表的最后一个节点。（通过递增节点内的唤醒计数器）。当一个线程超时时，它会检查它是否被标记，或链接列表中的任何其他线程。在这种情况下我们只解决比赛，否则我们认为这是暂停。

https://codereview.qt-project.org/#/c/66810/

这个补丁添加了相当多的代码来添加和删除链表中的节点，并且还要查看列表以检查我们是否确实被唤醒了。链表由等待线程数限制。我期待这个链表处理与QWaitCondition的其他成本相比可以忽略不计

然而，QWaitCondition基准测试的结果显示，有10个线程和高争用，我们有~10％的惩罚。 5线程有5％的罚款。

为解决比赛而支付此罚款是否值得？到目前为止，我们决定不合并补丁并保持比赛。

为什么pthread_cond_timedwait doc谈论“不可避免的竞赛”？

3 个答案: