即使异步I / O操作处于挂起状态,也只有线程处理io_service正在等待

时间:2013-03-30 01:14:49

标签: linux multithreading boost-asio

Boost的ASIO调度员似乎有一个严重的问题,我似乎无法找到解决方法。症状是,等待分派的唯一线程留在pthread_cond_wait,但有待处理的I / O操作要求它在epoll_wait中阻塞。

我可以通过让一个线程在循环中调用poll_one直到它返回零来轻松复制此问题。当调用run的线程突然退出循环时,这会使调用pthread_cond_wait的线程停留在poll_one中。据推测,io_service期望该线程返回epoll_wait中的阻塞,但它没有义务这样做,而且期望似乎是致命的。

是否要求线程与io_service s静态关联?

这是一个显示死锁的示例。这是处理此io_service的唯一线程,因为其他人已经继续。肯定有套接字操作待定:

#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (...) at /usr/include/boost/asio/detail/posix_event.hpp:80
#2 boost::asio::detail::task_io_service::do_run_one (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:405
#3 boost::asio::detail::task_io_service::run (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146

我相信错误如下:如果服务于I / O队列的线程是阻塞I / O套接字就绪检查并且调用调度函数的线程,如果有任何其他线程被阻塞io服务,它必须发出信号。它目前仅表示当时是否有准备好运行的处理程序。但是没有线程检查套接字准备情况。

1 个答案:

答案 0 :(得分:6)

这是一个错误。我已经能够通过在task_io_service::do_poll_one的非关键部分添加延迟来复制它。以下是booost/asio/detail/impl/task_io_service.ipp中修改后的task_io_service::do_poll_one()的摘要。唯一增加的是睡眠。

std::size_t task_io_service::do_poll_one(mutex::scoped_lock& lock,
    task_io_service::thread_info& this_thread,
    const boost::system::error_code& ec)
{
  if (stopped_)
    return 0;

  operation* o = op_queue_.front();
  if (o == &task_operation_)
  {
    op_queue_.pop();
    lock.unlock();

    {
      task_cleanup c = { this, &lock, &this_thread };
      (void)c;

      // Run the task. May throw an exception. Only block if the operation
      // queue is empty and we're not polling, otherwise we want to return
      // as soon as possible.
      task_->run(false, this_thread.private_op_queue);
      boost::this_thread::sleep_for(boost::chrono::seconds(3));
    }

    o = op_queue_.front();
    if (o == &task_operation_)
      return 0;
  }

...

我的测试驱动程序非常基础:

  • 通过计时器进行异步工作循环,打印“。”每3秒钟一次。
  • 产生一个将轮询io_service
  • 的线程
  • 延迟允许新线程时间轮询io_service,并且当轮询线程在io_service::run()中休眠时,主要调用task_io_service::do_poll_one()

测试代码:

#include <iostream>

#include <boost/asio/io_service.hpp>
#include <boost/asio/steady_timer.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>

boost::asio::io_service io_service;
boost::asio::steady_timer timer(io_service);

void arm_timer()
{
  std::cout << ".";
  std::cout.flush();
  timer.expires_from_now(boost::chrono::seconds(3));
  timer.async_wait(boost::bind(&arm_timer));
}

int main()
{
  // Add asynchronous work loop.
  arm_timer();

  // Spawn poll thread.
  boost::thread poll_thread(
    boost::bind(&boost::asio::io_service::poll, boost::ref(io_service)));

  // Give time for poll thread service reactor.
  boost::this_thread::sleep_for(boost::chrono::seconds(1));

  io_service.run();
}

调试:

[twsansbury@localhost bug]$ gdb a.out 
...
(gdb) r
Starting program: /home/twsansbury/dev/bug/a.out 

[Thread debugging using libthread_db enabled]
.[New Thread 0xb7feeb90 (LWP 31892)]
[Thread 0xb7feeb90 (LWP 31892) exited]

此时,arm_timer()已打印“。”曾经(当它被武装起来时)。 poll线程以非阻塞方式为反应堆提供服务,并且在op_queue_为空时睡眠3秒(当task_operation_退出时,op_queue_将被添加回task_cleanup c范围)。当op_queue_为空时,主线程调用io_service::run(),看到op_queue_为空,并使自己成为first_idle_thread_,等待其wakeup_event 。 poll线程完成休眠,并返回0,主线程等待wakeup_event

等待10秒后,arm_timer()有足够的时间准备就绪,我打断调试器:

Program received signal SIGINT, Interrupt.
0x00919402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00919402 in __kernel_vsyscall ()
#1  0x0081bbc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x00763b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x08059dc2 in void boost::asio::detail::posix_event::wait >(boost::asio::detail::scoped_lock&) ()
#4  0x0805a009 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) ()
#5  0x0805a11c in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#6  0x0805a1e2 in boost::asio::io_service::run() ()
#7  0x0804db78 in main ()

并排时间表如下:

          poll thread                  |          main thread
---------------------------------------+---------------------------------------
  lock()                               | 
  do_poll_one()                        |                          
  |-- pop task_operation_ from         |
  |   queue_op_                        |
  |-- unlock()                         |  lock()
  |-- create task_cleanup              |  do_run_one()
  |-- service reactor (non-block)      |  `-- queue_op_ is empty
  |-- ~task_cleanup()                  |      |-- set thread as idle
  |   |-- lock()                       |      `-- unlock()
  |   `-- queue_op_.push(              |
  |       task_operation_)             |
  `-- task_operation_ is               | 
      queue_op_.front()                |
      `-- return 0                     |  // still waiting on wakeup_event
  unlock()                             |

据我所知,补丁没有副作用:

if (o == &task_operation_)
  return 0;

为:

if (o == &task_operation_)
{
  if (!one_thread_)
    wake_one_thread_and_unlock(lock);
  return 0;
}

无论如何,我已经提交了bug and fix。考虑一下官方回复的票据。