Question

我正在为密集型网络应用程序实现无锁单生成器单个使用者队列。我有一堆工作线程在他们各自的队列中接收工作，然后他们出列并处理。

从这些队列中删除锁定大大提高了高负载下的性能，但是当队列为空时它们不再阻塞，这反过来又导致CPU使用量急剧上升。

如何有效地导致线程阻塞，直到它成功出列或被杀/中断为止？

Answer 1

如果您使用的是Linux，请考虑使用Futex。它通过使用原子操作而不是像互斥锁那样的内核调用来提供非锁定实现的性能，但是如果由于某些条件不正确（即锁定争用）而需要将进程设置为空闲，它将会然后进行适当的内核调用以使进程进入休眠状态并在将来的事件中将其唤醒。它基本上就像一个非常快的信号量。

Answer 2

在Linux上，futex可用于阻止线程。但请注意Futexes Are Tricky！

更新：条件变量比futexes更安全，更便于携带。但是，条件变量与互斥锁结合使用，因此严格来说结果将不再是无锁定的。但是，如果您的主要目标是性能（而不是全局进度的保证），并且锁定部分（即线程唤醒后检查的条件）很小，则可能会在不需要进入的情况下获得满意的结果将futexs集成到算法中的细微之处。

Answer 3

如果您使用的是Windows，则无法使用futex，但Windows Vista具有类似Keyed Events的机制。不幸的是，这不是已发布的API（它是NTDLL本机API）的一部分，但只要您接受在未来版本的Windows中可能会更改的警告（并且您不需要运行），您就可以使用它Vista之前的内核）。请务必阅读我上面链接的文章。这是一个未经测试的草图，说明它可能如何工作：

/* Interlocked SList queue using keyed event signaling */

struct queue {
    SLIST_HEADER slist;
    // Note: Multiple queues can (and should) share a keyed event handle
    HANDLE keyed_event;
    // Initial value: 0
    // Prior to blocking, the queue_pop function increments this to 1, then
    // rechecks the queue. If it finds an item, it attempts to compxchg back to
    // 0; if this fails, then it's racing with a push, and has to block
    LONG block_flag;
};

void init_queue(queue *qPtr) {
    NtCreateKeyedEvent(&qPtr->keyed_event, -1, NULL, 0);
    InitializeSListHead(&qPtr->slist);
    qPtr->blocking = 0;
}

void queue_push(queue *qPtr, SLIST_ENTRY *entry) {
    InterlockedPushEntrySList(&qPtr->slist, entry);

    // Transition block flag 1 -> 0. If this succeeds (block flag was 1), we
    // have committed to a keyed-event handshake
    LONG oldv = InterlockedCompareExchange(&qPtr->block_flag, 0, 1);
    if (oldv) {
        NtReleaseKeyedEvent(qPtr->keyed_event, (PVOID)qPtr, FALSE, NULL);
    }
}

SLIST_ENTRY *queue_pop(queue *qPtr) {
    SLIST_ENTRY *entry = InterlockedPopEntrySList(&qPtr->slist);
    if (entry)
        return entry; // fast path

    // Transition block flag 0 -> 1. We must recheck the queue after this point
    // in case we race with queue_push; however since ReleaseKeyedEvent
    // blocks until it is matched up with a wait, we must perform the wait if
    // queue_push sees us
    LONG oldv = InterlockedCompareExchange(&qPtr->block_flag, 1, 0);

    assert(oldv == 0);

    entry = InterlockedPopEntrySList(&qPtr->slist);
    if (entry) {
        // Try to abort
        oldv = InterlockedCompareExchange(&qPtr->block_flag, 0, 1);
        if (oldv == 1)
            return entry; // nobody saw us, we can just exit with the value
    }

    // Either we don't have an entry, or we are forced to wait because
    // queue_push saw our block flag. So do the wait
    NtWaitForKeyedEvent(qPtr->keyed_event, (PVOID)qPtr, FALSE, NULL);
    // block_flag has been reset by queue_push

    if (!entry)
        entry = InterlockedPopEntrySList(&qPtr->slist);
    assert(entry);

    return entry;
}

您还可以使用Slim Read Write锁和Condition Variables使用类似的协议，并使用无锁的快速路径。这些是键控事件的包装器，因此它们可能比直接使用键控事件产生更多的开销。

Answer 4

你试过有条件的等待吗？当队列变空时，只需开始等待新作业。将作业放入队列的线程应该触发信号。这样，只有在队列为空时才使用锁。

https://computing.llnl.gov/tutorials/pthreads/#ConditionVariables

Answer 5

您可以使用sigwait（）函数使线程休眠。你可以用pthread_kill唤醒线程。这比条件变量快得多。

Answer 6

您可以在等待时添加睡眠。只需选择你愿意拥有的最大等待，然后做这样的事情（伪代码，因为我不记得pthread语法）：

WAIT_TIME = 100; // Set this to whatever you're happy with
while(loop_condition) {
   thing = get_from_queue()
   if(thing == null) {
       sleep(WAIT_TIME);
   } else {
       handle(thing);
   }
}

即使像100毫秒睡眠这样短暂的事情也会显着降低CPU使用率。我不确定上下文切换会在什么时候比忙碌的等待更糟糕。

如何实现无锁，但阻塞行为？

6 个答案: