Boost的ASIO调度员似乎有一个严重的问题,我似乎无法找到解决方法。症状是,等待分派的唯一线程留在pthread_cond_wait
,但有待处理的I / O操作要求它在epoll_wait
中阻塞。
我可以通过让一个线程在循环中调用poll_one
直到它返回零来轻松复制此问题。当调用run
的线程突然退出循环时,这会使调用pthread_cond_wait
的线程停留在poll_one
中。据推测,io_service期望该线程返回epoll_wait
中的阻塞,但它没有义务这样做,而且期望似乎是致命的。
是否要求线程与io_service
s静态关联?
这是一个显示死锁的示例。这是处理此io_service的唯一线程,因为其他人已经继续。肯定有套接字操作待定:
#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (...) at /usr/include/boost/asio/detail/posix_event.hpp:80
#2 boost::asio::detail::task_io_service::do_run_one (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:405
#3 boost::asio::detail::task_io_service::run (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146
我相信错误如下:如果服务于I / O队列的线程是阻塞I / O套接字就绪检查并且调用调度函数的线程,如果有任何其他线程被阻塞io服务,它必须发出信号。它目前仅表示当时是否有准备好运行的处理程序。但是没有线程检查套接字准备情况。
答案 0 :(得分:6)
这是一个错误。我已经能够通过在task_io_service::do_poll_one
的非关键部分添加延迟来复制它。以下是booost/asio/detail/impl/task_io_service.ipp
中修改后的task_io_service::do_poll_one()
的摘要。唯一增加的是睡眠。
std::size_t task_io_service::do_poll_one(mutex::scoped_lock& lock,
task_io_service::thread_info& this_thread,
const boost::system::error_code& ec)
{
if (stopped_)
return 0;
operation* o = op_queue_.front();
if (o == &task_operation_)
{
op_queue_.pop();
lock.unlock();
{
task_cleanup c = { this, &lock, &this_thread };
(void)c;
// Run the task. May throw an exception. Only block if the operation
// queue is empty and we're not polling, otherwise we want to return
// as soon as possible.
task_->run(false, this_thread.private_op_queue);
boost::this_thread::sleep_for(boost::chrono::seconds(3));
}
o = op_queue_.front();
if (o == &task_operation_)
return 0;
}
...
我的测试驱动程序非常基础:
io_service
。io_service
,并且当轮询线程在io_service::run()
中休眠时,主要调用task_io_service::do_poll_one()
。测试代码:
#include <iostream>
#include <boost/asio/io_service.hpp>
#include <boost/asio/steady_timer.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>
boost::asio::io_service io_service;
boost::asio::steady_timer timer(io_service);
void arm_timer()
{
std::cout << ".";
std::cout.flush();
timer.expires_from_now(boost::chrono::seconds(3));
timer.async_wait(boost::bind(&arm_timer));
}
int main()
{
// Add asynchronous work loop.
arm_timer();
// Spawn poll thread.
boost::thread poll_thread(
boost::bind(&boost::asio::io_service::poll, boost::ref(io_service)));
// Give time for poll thread service reactor.
boost::this_thread::sleep_for(boost::chrono::seconds(1));
io_service.run();
}
调试:
[twsansbury@localhost bug]$ gdb a.out ... (gdb) r Starting program: /home/twsansbury/dev/bug/a.out [Thread debugging using libthread_db enabled] .[New Thread 0xb7feeb90 (LWP 31892)] [Thread 0xb7feeb90 (LWP 31892) exited]
此时,arm_timer()
已打印“。”曾经(当它被武装起来时)。 poll线程以非阻塞方式为反应堆提供服务,并且在op_queue_
为空时睡眠3秒(当task_operation_
退出时,op_queue_
将被添加回task_cleanup c
范围)。当op_queue_
为空时,主线程调用io_service::run()
,看到op_queue_
为空,并使自己成为first_idle_thread_
,等待其wakeup_event
。 poll线程完成休眠,并返回0
,主线程等待wakeup_event
。
等待10秒后,arm_timer()
有足够的时间准备就绪,我打断调试器:
Program received signal SIGINT, Interrupt. 0x00919402 in __kernel_vsyscall () (gdb) bt #0 0x00919402 in __kernel_vsyscall () #1 0x0081bbc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x00763b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x08059dc2 in void boost::asio::detail::posix_event::wait >(boost::asio::detail::scoped_lock&) () #4 0x0805a009 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) () #5 0x0805a11c in boost::asio::detail::task_io_service::run(boost::system::error_code&) () #6 0x0805a1e2 in boost::asio::io_service::run() () #7 0x0804db78 in main ()
并排时间表如下:
poll thread | main thread ---------------------------------------+--------------------------------------- lock() | do_poll_one() | |-- pop task_operation_ from | | queue_op_ | |-- unlock() | lock() |-- create task_cleanup | do_run_one() |-- service reactor (non-block) | `-- queue_op_ is empty |-- ~task_cleanup() | |-- set thread as idle | |-- lock() | `-- unlock() | `-- queue_op_.push( | | task_operation_) | `-- task_operation_ is | queue_op_.front() | `-- return 0 | // still waiting on wakeup_event unlock() |
据我所知,补丁没有副作用:
if (o == &task_operation_)
return 0;
为:
if (o == &task_operation_)
{
if (!one_thread_)
wake_one_thread_and_unlock(lock);
return 0;
}
无论如何,我已经提交了bug and fix。考虑一下官方回复的票据。