Question

我正在研究Google的细丝工作系统。目前，我正在研究他们实施的WorkStealingDequeue。您可以查看完整的源代码here。此数据结构基于此work。在实现弹出和窃取的过程中，他们将memory_order_seq_cst用作完整的内存屏障。

template <typename TYPE, size_t COUNT>
TYPE WorkStealingDequeue<TYPE, COUNT>::pop() noexcept {
    // mBottom is only written in push(), which cannot be concurrent with pop(),
    // however, it is read in steal(), so we need basic atomicity.
    //   i.e.: bottom = mBottom--;
    int32_t bottom = mBottom.fetch_sub(1, std::memory_order_relaxed) - 1;

    // we need a full memory barrier here; mBottom must be written and visible to
    // other threads before we read mTop.
    int32_t top = mTop.load(std::memory_order_seq_cst);

    if (top < bottom) {
        // Queue isn't empty and it's not the last item, just return it.
        return getItemAt(bottom);
    }

    TYPE item{};
    if (top == bottom) {
        // We took the last item in the queue
        item = getItemAt(bottom);

        // Items can be added only in push() which isn't concurrent to us, however we could
        // be racing with a steal() -- pretend to steal from ourselves to resolve this
        // potential conflict.
        if (mTop.compare_exchange_strong(top, top + 1,
                std::memory_order_seq_cst,
                std::memory_order_relaxed)) {
            // success: mTop was equal to top, mTop now equals top+1
            // We successfully poped an item, adjust top to make the queue canonically empty.
            top++;
        } else {
            // failure: mTop was not equal to top, which means the item was stolen under our feet.
            // top now equals to mTop. Simply discard the item we just poped.
            // The queue is now empty.
            item = TYPE();
        }
    }

    // no concurrent writes to mBottom possible
    mBottom.store(top, std::memory_order_relaxed);
    return item;
}

template <typename TYPE, size_t COUNT>
TYPE WorkStealingDequeue<TYPE, COUNT>::steal() noexcept {
    do {
        // mTop must be read before mBottom
        int32_t top = mTop.load(std::memory_order_seq_cst);

        // mBottom is written concurrently to the read below in pop() or push(), so
        // we need basic atomicity. Also makes sure that writes made in push()
        // (prior to mBottom update) are visible.
        int32_t bottom = mBottom.load(std::memory_order_acquire);

        if (top >= bottom) {
            // queue is empty
            return TYPE();
        }

        // The queue isn't empty
        TYPE item(getItemAt(top));
        if (mTop.compare_exchange_strong(top, top + 1,
                std::memory_order_seq_cst,
                std::memory_order_relaxed)) {
            // success: we stole a job, just return it.
            return item;
        }
        // failure: the item we just tried to steal was pop()'ed under our feet,
        // simply discard it; nothing to do.
    } while (true);
}

为使实现正确，要求在pop（）中的mTop之前先获取mBottom，在steal（）中的mBottom之前先获取mTop。如果像大多数实现一样，将memory_order_seq_cst视为完整的内存屏障，那么上面的代码是正确的。但是据我了解，C ++ 11并没有说将memory_order_seq_cst作为完整的内存屏障。据我了解，为了确保正确的排序，那么mBottom fetch_sub操作必须至少为std :: memory_order_acq_rel。我的分析正确吗？

然后在mTop上需要memory_order_seq_cst吗？ memory_order_seq_cst强制mTop上的所有操作都按单个总订单（STO）执行。但是在这种情况下，唯一参与STO的是mTop。我相信我们已经有了修改顺序保证，其中规定每个线程必须就每个变量相对于其自身的修改顺序达成一致。 compare_exchange_strong操作中的memory_order_acq_rel是否足够？

Answer 1

此代码在steal中具有数据竞争，因此，不管内存顺序如何，其行为都是不确定的。

没有什么可以阻止窃取线程调用getItemAt(top)读取给定索引处的值，而拥有队列的工作线程调用push的次数足以缠绕缓冲区并覆盖条目，或调用pop多次以清空队列，然后调用push覆盖该条目。

例如mTop为0，mBottom为1 =>队列中只有一个元素。

隐藏线程读取mTop和mBottom。 top<bottom，因此它进入呼叫getItemAt(top)，由于任务切换而被操作系统挂起。

工作线程调用pop。它读取mBottom并将bottom设置为0。然后读取top（0）。 0==0，因此我们调用getItemAt(bottom)来检索项目。然后将mTop增至1，并将mBottom设为1。

然后，工作线程调用push并调用setItemAt(mBottom)来设置下一个元素，现在是元素1。

工作线程现在重复push次此pop / COUNT舞步，因此队列中的元素永远不会超过一个，但是每次递增mTop和{{ 1}}，因此活动元素将在缓冲区周围移动，直到mBottom再次为0。

工作线程调用mBottom & MASK，然后调用push，后者访问元素0。操作系统恢复窃取线程，后者也正在访问元素0 =>读取和写入相同位置而无需排序= >数据竞争和不确定的行为。

只有对于某些setItemAt(mBottom)，如果TYPE是std::atomic<T>，这是可以的。

假设T足够大，实际上这是永远不会发生的，则COUNT用push写到mBottom，而memory_order_release用{{ 1}}。这意味着对相关数据项的写操作发生在steal中读取该数据项之前，因此可以读取该项目。即使使用memory_order_acquire，steal中的fetch_sub也可以看到，这是由于“释放序列”的概念。

在负载上使用pop并成功进行memory_order_relaxed的比较交换将memory_order_seq_cst上的操作强制为单个全局总顺序。但是，对mTop中mTop的加载的评论是错误的：使用mTop不会阻止对pop调用进行重新排序，因为这是一个{{1 }}从memory_order_seq_cst开始，并且mBottom.fetch_sub调用使用load。 mTop上的fetch_sub不会对从同一线程到其他变量的非memory_order_relaxed写入施加任何顺序。

目前我不确定这可能会对代码产生什么影响。

Google的WorkStealingDequeue使用memory_order_seq_cst作为完整的内存屏障。有效吗

1 个答案: