我试图编写一个无互斥(但不是无锁)的队列,它使用连续的内存范围作为循环缓冲区和四个指针:两个用于消费者,两个用于生产者。它在最新的推送元素之后保留一个空格,以消除完整队列和空队列之间的歧义。这是实施:
template <typename T, typename Allocator = std::allocator<T>>
class concurrent_queue
{
protected:
T *storage;
std::size_t s;
std::atomic<T*> consumer_head, producer_head;
union alignas(16) dpointer
{
struct
{
T *ptr;
std::size_t cnt;
};
__int128 val;
};
dpointer consumer_pending, producer_pending;
Allocator alloc;
public:
concurrent_queue(std::size_t s): storage(nullptr), consumer_head(nullptr), producer_head(nullptr)
{
storage = alloc.allocate(s+1);
consumer_head = storage;
__atomic_store_n(&(consumer_pending.val), (dpointer{storage, 0}).val, __ATOMIC_SEQ_CST);
producer_head = storage;
__atomic_store_n(&(producer_pending.val), (dpointer{storage, 0}).val, __ATOMIC_SEQ_CST);
this->s = s + 1;
}
~concurrent_queue()
{
while(consumer_head != producer_head)
{
alloc.destroy(consumer_head.load());
++consumer_head;
if(consumer_head == storage + s)
consumer_head = storage;
}
alloc.deallocate(storage, s);
}
template <typename U>
bool push(U&& e)
{
while(true)
{
dpointer a;
a.val = __atomic_load_n(&(producer_pending.val), __ATOMIC_RELAXED);
std::atomic_thread_fence(std::memory_order_acquire);
auto b = consumer_head.load(std::memory_order_relaxed);
auto next = a.ptr + 1;
if(next == storage + s) next = storage;
if(next == b) continue;
dpointer newval{next, a.cnt+1};
if(!__atomic_compare_exchange_n(&(producer_pending.val), &(a.val), (newval.val), true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) continue;
alloc.construct(a.ptr, std::forward<U>(e));
while(!producer_head.compare_exchange_weak(a.ptr, next, std::memory_order_release, std::memory_order_relaxed));
return true;
}
}
template <typename U>
bool pop(U& result)
{
while(true)
{
dpointer a;
a.val = __atomic_load_n(&(consumer_pending.val), __ATOMIC_RELAXED);
std::atomic_thread_fence(std::memory_order_acquire);
auto b = producer_head.load(std::memory_order_relaxed);
auto next = a.ptr + 1;
if(next == storage + s) next = storage;
if(a.ptr == b) continue;
dpointer newval{next, a.cnt+1};
if(!__atomic_compare_exchange_n(&(consumer_pending.val), &(a.val), (newval.val), true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) continue;
result = std::move(*(a.ptr));
alloc.destroy(a.ptr);
while(!consumer_head.compare_exchange_weak(a.ptr, next, std::memory_order_release, std::memory_order_relaxed));
return true;
}
}
};
然而,当使用相同数量的单独的推动和弹出线程进行测试时,每个线程在终止之前推送/弹出相等的预定数量的元素,一些弹出线程有时(并不总是)卡在第一个CAS处执行中的某些点并且永远不会终止,即使在所有推送线程终止之后。由于他们试图弹出与推送线程推送相同数量的元素,我怀疑在某个点上推送线程中发生了覆盖。
这是我第一次尝试编写一个并发容器,所以我对此非常缺乏经验...我已经盯着这一段了一段时间并且无法弄清楚是什么是错的。这个问题的人可以更有经验吗?
此外,是否有更少的平台特定方式来获得双倍宽度的CAS?
答案 0 :(得分:1)
编辑:大多数内容都是这篇文章实际上是假的。见评论。
dpointer a;
a.val = __atomic_load_n(&(producer_pending.val), __ATOMIC_RELAXED);
std::atomic_thread_fence(std::memory_order_acquire);
auto b = consumer_head.load(std::memory_order_relaxed);
你绝对确定这会做你认为的吗?此代码段在b。
之前执行不序列a.valstd :: atomic_thread_fence(std :: memory_order_acquire);保证栅栏后的内存读取操作不会在栅栏之前重新排序。但是没有什么能阻止围栏上方的记忆操作流到底部。只要不与其他围栏重新排序,编译器就可以完全自由地向上移动获取围栏。
更抽象:
a = load relaxed
memory fence acquire -- memory operations below this line may not float upwards
b = load relaxed
此编译器可能会将其转换为:
memory fence acquire
b = load relaxed
a = load relaxed
但不是这样:
a = load relaxed
b = load relaxed
memory fence acquire
此外,您应该真正避免内存屏障并在操作本身上添加获取/释放。这通常可以为非x86目标生成更好的代码。对于x86而言并不重要,因为即使是普通的mov
足以提供顺序一致性也是各种情况。