为什么boost :: asio :: ip :: tcp :: socket :: async_connect阻塞整个线程?

时间:2015-11-02 10:13:31

标签: c++ multithreading sockets c++11 boost

实际上现在差不多两个星期我一直在搞清楚这个问题,最后我至少知道可能的原因。发生的事情是,对socket::async_connect的调用会阻塞该线程并且看起来像同步。这个电话看起来像吼叫,发生在:

void Layer104::connect(const std::string& ip, const std::string& port)
{
  try
  {
    tcp::resolver resolver(io);
    tcp::resolver::query query(ip.c_str(), port.c_str());
    tcp::resolver::iterator it = resolver.resolve(query);  

    LOGMTRTTIDEBUG("StartProtocol()"<<endl);
    activeSocket->async_connect(*it, boost::bind(&LayerXXX::connectHandler, getPtr(), boost::asio::placeholders::error));
    LOGMTRTTIDEBUG("StartProtocol() started using activeSocket->async_connect."<<endl);
  }
  catch (exception&)
  {
    LOGMTRTTIERR("ERR_CONNECTION_FAILED"<<endl)
    handleError(ERR_CONNECTION_FAILED);
  }  
}

如您所见,我有两个调试消息,以查看函数是否返回。 90%的函数立即返回,我在日志中得到了这个:

  Date                     Thread name                 Thread id  Method name                    Message
D 2015-11-02 10:35:39:787 [Client 192.168.8.23:1242        #6576] MyProtocol::LayerXXX::connect StartProtocol()
D 2015-11-02 10:35:39:788 [Client 192.168.8.23:1242        #6576] MyProtocol::LayerXXX::connect StartProtocol() started using activeSocket->async_connect.

但有时候,这个只在网络连接非常糟糕时发生,我只看到这个,该线程的最后一条消息:

  Date                     Thread name   ID     Method name                   Message
D 2015-11-02 10:35:39:787 [Client        #6576] MyProtocol::LayerXXX::connect StartProtocol()

从那时起,线程就停留在这里:

image description

我正在谈论的主题是使用boost::asio::io_service事件循环,并且还从此事件循环调用connect方法。这是线程的启动方式:

void MyClient::threadMain() {
    LOGMTRTTIDEBUG("Starting boost::asio::io_service main loop."<<endl);
    // IP and port is added to name after connecting succesfully
    ThreadNameMap::setName( "Client" );
    io_.run();
    LOGMTRTTIDEBUG("boost::asio::io_service main OVER! Thread DEAD!"<<endl);
}

线程卡住时的相关堆栈跟踪:

ntdll.dll!NtWaitForSingleObject()  + 0xa bytes  
mswsock.dll!__GSHandlerCheck_SEH()  + 0x2c95 bytes  
mswsock.dll!__GSHandlerCheck_SEH()  + 0x5e0c bytes  
ws2_32.dll!WSAAccept()  + 0xd4 bytes    
ws2_32.dll!accept()  + 0x15 bytes   
myapp64d.exe!boost::asio::detail::socket_ops::call_accept<int>(int * __formal, unsigned __int64 s, sockaddr * addr, unsigned __int64 * addrlen)  Line 96 + 0x32 bytes   C++
myapp64d.exe!boost::asio::detail::socket_ops::accept(unsigned __int64 s, sockaddr * addr, unsigned __int64 * addrlen, boost::system::error_code & ec)  Line 114 + 0x19 bytes    C++
myapp64d.exe!boost::asio::detail::socket_select_interrupter::open_descriptors()  Line 90 + 0x1c bytes   C++
myapp64d.exe!boost::asio::detail::socket_select_interrupter::socket_select_interrupter()  Line 42   C++
myapp64d.exe!boost::asio::detail::select_reactor::select_reactor(boost::asio::io_service & io_service)  Line 48 + 0x86 bytes    C++
myapp64d.exe!boost::asio::detail::service_registry::create<boost::asio::detail::select_reactor>(boost::asio::io_service & owner)  Line 81 + 0x26 bytes  C++
myapp64d.exe!boost::asio::detail::service_registry::do_use_service(const boost::asio::io_service::service::key & key, boost::asio::io_service::service * (boost::asio::io_service &)* factory)  Line 123 + 0x13 bytes   C++
myapp64d.exe!boost::asio::detail::service_registry::use_service<boost::asio::detail::select_reactor>()  Line 49 C++
myapp64d.exe!boost::asio::use_service<boost::asio::detail::select_reactor>(boost::asio::io_service & ios)  Line 34  C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service_base::get_reactor()  Line 620 + 0xd bytes C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service_base::start_connect_op(boost::asio::detail::win_iocp_socket_service_base::base_implementation_type & impl, boost::asio::detail::reactor_op * op, const sockaddr * addr, unsigned __int64 addrlen)  Line 550 + 0xd bytes   C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl, const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler)  Line 515  C++
myapp64d.exe!boost::asio::stream_socket_service<boost::asio::ip::tcp>::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl, const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, const boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler)  Line 234  C++
myapp64d.exe!boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, const boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler)  Line 779 C++
myapp64d.exe!MyProtocol::LayerXXX::connect(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & ip, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & port)  Line 225    C++     ntdll.dll!NtWaitForSingleObject()  + 0xa bytes  

到目前为止我所知道的:

  1. 仅当网络丢弃数据包时才会发生这种情况,因此必须与时间相关。我使用clumsy 0.2(推荐)模拟了这个。
  2. boost::asio::ip::tcp::socket::async_connect事件循环调用
  3. boost::asio::io_service,但它大部分时间都有效
  4. 根据我们获得的错误报告,这个帖子在任何合理的时间内都不会被卡住。所以它不像某些超时......无论如何,连接应该是异步的。
  5. 我们的生产系统存在此问题,我真的在寻找可以帮助我们找到导致修复/解决此问题的任何提示。

1 个答案:

答案 0 :(得分:0)

解决方案:在笨拙的过滤器中排除127.0.0.1。

我在笨拙地使用boost async_connect时遇到了同样的问题,即使我只是在使用UDP套接字。我发现问题是由笨拙的过滤引起的,如果过滤地址包括127.0.0.1(例如,出站和ip.DstAddr> = 127.0.0.1和ip.DstAddr <= 239.255.255.255),则问题将频发,如果没有,问题就消失了。因此,我查看了Boost async_connect的代码,发现它在函数::accept的{​​{1}}中的127.0.0.1中调用了open_descriptors,如果网络打包丢失,它将阻塞。