Question

我有代码将任务分派给要远程处理的asio io_service对象。据我所知，代码表现正常，但不幸的是，我对内存排序知之甚少，而且我不确定在检查原子标志时我应该使用哪些内存顺序以确保最佳性能。

//boost::asio::io_service;
//^^ Declared outside this scope
std::vector<std::atomic_bool> flags(num_of_threads, false);
//std::vector<std::thread> threads(num_of_threads);
//^^ Declared outside this scope, all of them simply call the run() method on io_service

for(int i = 0; i < num_of_threads; i++) {
    io_service.post([&, i]{
        /*...*/
        flags[i].store(true, /*[[[1]]]*/);
    });
}

for(std::atomic_bool & atm_bool : flags) while(!atm_bool.load(/*[[[2]]]*/)) std::this_thread::yield();

基本上，我想知道的是，我应该用[[[1]]]和[[[2]]]代替什么？

如果有帮助，代码在功能上类似于以下内容：

std::vector<std::thread> threads;
for(int i = 0; i < num_of_threads; i++) threads.emplace_back([]{/*...*/});
for(std::thread & thread : threads) thread.join();

除了我的代码使线程在外部线程池中保持活动状态并将任务分派给它们。

Answer 1

您希望在设置标志的线程和看到它已设置的线程之间建立发生在之前的关系。这意味着一旦线程看到该标志被设置，它也将在设置之前看到另一个线程所做的所有事情的效果（否则这不保证）。

这可以使用release-acquire语义来完成：

flags[i].store(true, std::memory_order_release);
// ...
while (!atm_bool.load(std::memory_order_acquire)) ...

请注意，在这种情况下，使用阻塞的OS级别信号量可能比对一组标志进行旋转等待更清晰。如果做不到这一点，旋转已完成任务的数量仍然会稍微高效，而不是检查每个任务的标志数组。

我应该将哪个内存顺序用于等待工作线程的主机线程？

1 个答案: