实际上现在差不多两个星期我一直在搞清楚这个问题,最后我至少知道可能的原因。发生的事情是,对socket::async_connect
的调用会阻塞该线程并且看起来像同步。这个电话看起来像吼叫,发生在:
void Layer104::connect(const std::string& ip, const std::string& port)
{
try
{
tcp::resolver resolver(io);
tcp::resolver::query query(ip.c_str(), port.c_str());
tcp::resolver::iterator it = resolver.resolve(query);
LOGMTRTTIDEBUG("StartProtocol()"<<endl);
activeSocket->async_connect(*it, boost::bind(&LayerXXX::connectHandler, getPtr(), boost::asio::placeholders::error));
LOGMTRTTIDEBUG("StartProtocol() started using activeSocket->async_connect."<<endl);
}
catch (exception&)
{
LOGMTRTTIERR("ERR_CONNECTION_FAILED"<<endl)
handleError(ERR_CONNECTION_FAILED);
}
}
如您所见,我有两个调试消息,以查看函数是否返回。 90%的函数立即返回,我在日志中得到了这个:
Date Thread name Thread id Method name Message
D 2015-11-02 10:35:39:787 [Client 192.168.8.23:1242 #6576] MyProtocol::LayerXXX::connect StartProtocol()
D 2015-11-02 10:35:39:788 [Client 192.168.8.23:1242 #6576] MyProtocol::LayerXXX::connect StartProtocol() started using activeSocket->async_connect.
但有时候,这个只在网络连接非常糟糕时发生,我只看到这个,该线程的最后一条消息:
Date Thread name ID Method name Message
D 2015-11-02 10:35:39:787 [Client #6576] MyProtocol::LayerXXX::connect StartProtocol()
从那时起,线程就停留在这里:
我正在谈论的主题是使用boost::asio::io_service
事件循环,并且还从此事件循环调用connect
方法。这是线程的启动方式:
void MyClient::threadMain() {
LOGMTRTTIDEBUG("Starting boost::asio::io_service main loop."<<endl);
// IP and port is added to name after connecting succesfully
ThreadNameMap::setName( "Client" );
io_.run();
LOGMTRTTIDEBUG("boost::asio::io_service main OVER! Thread DEAD!"<<endl);
}
线程卡住时的相关堆栈跟踪:
ntdll.dll!NtWaitForSingleObject() + 0xa bytes
mswsock.dll!__GSHandlerCheck_SEH() + 0x2c95 bytes
mswsock.dll!__GSHandlerCheck_SEH() + 0x5e0c bytes
ws2_32.dll!WSAAccept() + 0xd4 bytes
ws2_32.dll!accept() + 0x15 bytes
myapp64d.exe!boost::asio::detail::socket_ops::call_accept<int>(int * __formal, unsigned __int64 s, sockaddr * addr, unsigned __int64 * addrlen) Line 96 + 0x32 bytes C++
myapp64d.exe!boost::asio::detail::socket_ops::accept(unsigned __int64 s, sockaddr * addr, unsigned __int64 * addrlen, boost::system::error_code & ec) Line 114 + 0x19 bytes C++
myapp64d.exe!boost::asio::detail::socket_select_interrupter::open_descriptors() Line 90 + 0x1c bytes C++
myapp64d.exe!boost::asio::detail::socket_select_interrupter::socket_select_interrupter() Line 42 C++
myapp64d.exe!boost::asio::detail::select_reactor::select_reactor(boost::asio::io_service & io_service) Line 48 + 0x86 bytes C++
myapp64d.exe!boost::asio::detail::service_registry::create<boost::asio::detail::select_reactor>(boost::asio::io_service & owner) Line 81 + 0x26 bytes C++
myapp64d.exe!boost::asio::detail::service_registry::do_use_service(const boost::asio::io_service::service::key & key, boost::asio::io_service::service * (boost::asio::io_service &)* factory) Line 123 + 0x13 bytes C++
myapp64d.exe!boost::asio::detail::service_registry::use_service<boost::asio::detail::select_reactor>() Line 49 C++
myapp64d.exe!boost::asio::use_service<boost::asio::detail::select_reactor>(boost::asio::io_service & ios) Line 34 C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service_base::get_reactor() Line 620 + 0xd bytes C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service_base::start_connect_op(boost::asio::detail::win_iocp_socket_service_base::base_implementation_type & impl, boost::asio::detail::reactor_op * op, const sockaddr * addr, unsigned __int64 addrlen) Line 550 + 0xd bytes C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl, const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler) Line 515 C++
myapp64d.exe!boost::asio::stream_socket_service<boost::asio::ip::tcp>::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl, const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, const boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler) Line 234 C++
myapp64d.exe!boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, const boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler) Line 779 C++
myapp64d.exe!MyProtocol::LayerXXX::connect(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & ip, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & port) Line 225 C++ ntdll.dll!NtWaitForSingleObject() + 0xa bytes
到目前为止我所知道的:
boost::asio::ip::tcp::socket::async_connect
事件循环调用boost::asio::io_service
,但它大部分时间都有效我们的生产系统存在此问题,我真的在寻找可以帮助我们找到导致修复/解决此问题的任何提示。
答案 0 :(得分:0)
解决方案:在笨拙的过滤器中排除127.0.0.1。
我在笨拙地使用boost async_connect
时遇到了同样的问题,即使我只是在使用UDP套接字。我发现问题是由笨拙的过滤引起的,如果过滤地址包括127.0.0.1
(例如,出站和ip.DstAddr> = 127.0.0.1和ip.DstAddr <= 239.255.255.255),则问题将频发,如果没有,问题就消失了。因此,我查看了Boost async_connect
的代码,发现它在函数::accept
的{{1}}中的127.0.0.1
中调用了open_descriptors
,如果网络打包丢失,它将阻塞。