Question

我正在学习Michael＆amp; Scott的无锁队列算法，并试图用C ++实现它。

但我在我的代码中制作了一个竞赛，并认为算法中可能存在竞争。

我在这里阅读了这篇论文： Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms 原始的Dequeue伪代码如下：

dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
D1:   loop                          // Keep trying until Dequeue is done
D2:      head = Q->Head             // Read Head
D3:      tail = Q->Tail             // Read Tail
D4:      next = head.ptr->next      // Read Head.ptr->next
D5:      if head == Q->Head         // Are head, tail, and next consistent?
D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
D7:            if next.ptr == NULL  // Is queue empty?
D8:               return FALSE      // Queue is empty, couldn't dequeue
D9:            endif
                // Tail is falling behind.  Try to advance it
D10:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
D11:         else                    // No need to deal with Tail
               // Read value before CAS
               // Otherwise, another dequeue might free the next node
D12:            *pvalue = next.ptr->value
               // Try to swing Head to the next node
D13:            if CAS(&Q->Head, head, <next.ptr, head.count+1>)
D14:               break             // Dequeue is done.  Exit loop
D15:            endif
D16:         endif
D17:      endif
D18:   endloop
D19:   free(head.ptr)                // It is safe now to free the old node
D20:   return TRUE                   // Queue was not empty, dequeue succeeded

在我看来，比赛是这样的：

线程1前进到D3，然后停止。
线程2前进到D3，读取与线程1相同的头部。
线程2幸运地一直前进到D20，在D19它释放了head.ptr
线程1继续并前进至D4，尝试读取head.ptr->next，但由于线程1已释放head.ptr，因此发生崩溃。

我的C ++代码总是在D4上为线程1崩溃。

任何人都可以指出我的错误并给出一些解释吗？

Answer 1

谢谢，非常有趣的主题！它肯定看起来像一个bug，但是该论文的作者之一声称他们的free（）不是正常的free（）我们都生活在一起，但是有些魔法free（），所以没有bug。奇妙。

请参阅http://blog.shealevy.com/2015/04/23/use-after-free-bug-in-maged-m-michael-and-michael-l-scotts-non-blocking-concurrent-queue-algorithm/

希望没有人在没有透彻分析的情况下投入生产。

Answer 2

实际上，这是自MS队列的作者之一Maged Michael引入危害指针[1]以来多年来一直在研究的无阻塞内存回收问题。

危险指针允许线程保留块，以便其他线程在完成之前不会真正回收它们。但是，这种机制会带来不小的性能开销。

还有很多基于纪元的回收变体，例如RCU [2,3]，最近还有基于间隔的回收（IBR）[4]。它们通过保留时间避免了无用后使用，并且比危险指针要快。据我所知，基于纪元的回收被广泛用于处理此问题。

您可以查看下面提到的这些论文以获取更多详细信息。 基于时间间隔的内存回收的论文讨论了很多背景。

这是非阻塞数据结构中的一个普遍问题，我们通常不将其视为数据结构本身的错误-毕竟，它仅在使用手动内存管理（如C / C ++）的语言中发生，而没有在Java之类的公司中（BTW，Michael＆Scott Queue在Java Concurrency中已被采用多年）。

参考：

[1] 危险指针：无锁对象的安全内存回收，Maged M. Michael，IEEE并行和分布式系统事务，2004年。

[2] 用于无锁同步的内存回收性能，Thomas E. Hart等人，《并行与分布式计算杂志》，2007年。

[3] 阅读副本更新，Paul E. McKenney等人，渥太华Linux研讨会，2002年。

[4] 基于时间间隔的内存回收，Haosen Wen等人，第23届ACM SIGPLAN并行编程原理和实践研讨会（PPoPP）的会议记录，

解释迈克尔＆amp; Scott无锁队列alorigthm

2 个答案: