服务器在16000个请求后暂停

时间:2016-11-29 05:05:04

标签: c++ boost-asio

我是boost :: asio的新手。试图运行

ab -n 20000 -c 5  -r http://127.0.0.1:9999/

每次16000请求后测试卡住。但它确实完成了。我也收到了很多失败的请求。

正在做什么代码:

  • 甲。创建服务
  • B中。创建接受者
  • ℃。绑定并倾听
  • d。创建套接字
  • F。执行async_connect
  • -G。在async_connect处理程序中关闭套接字。创建新的并使用相同的处理程序执行async_connect。

代码如下:

#include <iostream>
#include <functional>
#include <string>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/thread.hpp>
#include <memory>

// global variable for service and acceptor
boost::asio::io_service ioService;
boost::asio::ip::tcp::acceptor accp(ioService);

// callback for accept
void onAccept(const boost::system::error_code &ec, shared_ptr<boost::asio::ip::tcp::socket> soc) {
    using boost::asio::ip::tcp;
    soc->send(boost::asio::buffer("In Accept"));
    soc->shutdown(boost::asio::ip::tcp::socket::shutdown_send);
    soc.reset(new tcp::socket(ioService));
    accp.async_accept(*soc, [=](const boost::system::error_code &ec) {
            onAccept(ec, soc);
        });
}

int main(int argc, char *argv[]) {
    using boost::asio::ip::tcp;
    boost::asio::ip::tcp::resolver resolver(ioService);
    try {
        boost::asio::ip::tcp::resolver::query query("127.0.0.1", boost::lexical_cast<std::string>(9999));
        boost::asio::ip::tcp::endpoint endpoint = *resolver.resolve(query);
        accp.open(endpoint.protocol());
        accp.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
        accp.bind(endpoint);
        cout << "Ready to accept @ 9999" << endl;

        auto t1 = boost::thread([&]() { ioService.run(); });

        accp.listen(boost::asio::socket_base::max_connections);
        std::shared_ptr<tcp::socket> soc = make_shared<tcp::socket>(ioService);

        accp.async_accept(*soc, [=](const boost::system::error_code &ec) { onAccept(ec, soc); });

        t1.join();
    } catch (std::exception &ex) {
        std::cout << "[" << boost::this_thread::get_id() << "] Exception: " << ex.what() << std::endl;
    }
}

为了完整性:

  1. 我根据@Arunmu
  2. 更改了我的代码
  3. 由于@ david-schwartz
  4. 建议的套接字问题,我使用了docker和linux
  5. 服务器现在永远不会挂起。
    • 单线程 - 每秒6045 req
    • 主题 - 每秒5849 req
  6. 使用async_write

2 个答案:

答案 0 :(得分:3)

首先,让我们更正确地做事。我已经将代码更改为使用独立的asio而不是boost one并使用c ++ 14功能。使用原始代码,我的更改会减少很多失败。

代码:

#include <iostream>
#include <functional>
#include <string>
#include <asio.hpp>
#include <thread>
#include <memory>
#include <system_error>
#include <chrono>

//global variable for service and acceptor
asio::io_service ioService;
asio::ip::tcp::acceptor accp(ioService); 

const char* response = "HTTP/1.1 200 OK\r\n\r\n\r\n";

//callback for accept 
void onAccept(const std::error_code& ec, std::shared_ptr<asio::ip::tcp::socket> soc)
{
    using asio::ip::tcp;
    soc->set_option(asio::ip::tcp::no_delay(true));
    auto buf = new asio::streambuf;
    asio::async_read_until(*soc, *buf, "\r\n\r\n",
        [=](auto ec, auto siz) {
          asio::write(*soc, asio::buffer(response, std::strlen(response)));
          soc->shutdown(asio::ip::tcp::socket::shutdown_send);
          delete buf;
          soc->close();
        });
    auto nsoc = std::make_shared<tcp::socket>(ioService);
    //soc.reset(new tcp::socket(ioService));
    accp.async_accept(*nsoc, [=](const std::error_code& ec){
      onAccept(ec, nsoc);
    });

}

int main( int argc, char * argv[] )
{
    using asio::ip::tcp;
    asio::ip::tcp::resolver resolver(ioService);

    try{
        asio::ip::tcp::resolver::query query( 
            "127.0.0.1", 
            std::to_string(9999)
        );

     asio::ip::tcp::endpoint endpoint = *resolver.resolve( query );
     accp.open( endpoint.protocol() );
     accp.set_option( asio::ip::tcp::acceptor::reuse_address( true ) );
     accp.bind( endpoint );

     std::cout << "Ready to accept @ 9999" << std::endl;

     auto t1 = std::thread([&]() { ioService.run(); });
     auto t2 = std::thread([&]() { ioService.run(); });

     accp.listen( 1000 );

     std::shared_ptr<tcp::socket> soc = std::make_shared<tcp::socket>(ioService);

     accp.async_accept(*soc, [=](const std::error_code& ec) {
                                onAccept(ec, soc);
                              });

    t1.join();
    t2.join();
    } catch(const std::exception & ex){
      std::cout << "[" << std::this_thread::get_id()
        << "] Exception: " << ex.what() << std::endl;
    } catch (...) {
      std::cerr << "Caught unknown exception" << std::endl;
    }
}

主要变化是:

  1. 发送正确的HTTP响应。
  2. 阅读请求。否则你只是填满你的套接字接收缓冲区。
  3. 正确的插座关闭。
  4. 使用多个线程。这主要是Mac OS所必需的,Linux不需要。
  5. 使用的测试命令:ab -n 20000 -c 1 -r http://127.0.0.1:9999/

    Linux上,测试通过而没有任何错误,并且没有使用io_service的其他线程。

    但是,在Mac上,我能够重现这个问题,即在处理了16000个请求后它被卡住了。那一刻的流程样本是:

    Call graph:
        906 Thread_1887605   DispatchQueue_1: com.apple.main-thread  (serial)
        + 906 start  (in libdyld.dylib) + 1  [0x7fff868bc5c9]
        +   906 main  (in server_hangs_so) + 2695  [0x10d3622b7]
        +     906 std::__1::thread::join()  (in libc++.1.dylib) + 20  [0x7fff86ad6ba0]
        +       906 __semwait_signal  (in libsystem_kernel.dylib) + 10  [0x7fff8f44c48a]
        906 Thread_1887609
        + 906 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff8d0983ed]
        +   906 _pthread_start  (in libsystem_pthread.dylib) + 176  [0x7fff8d09afd7]
        +     906 _pthread_body  (in libsystem_pthread.dylib) + 131  [0x7fff8d09b05a]
        +       906 void* std::__1::__thread_proxy<std::__1::tuple<main::$_2> >(void*)  (in server_hangs_so) + 124  [0x10d36317c]
        +         906 asio::detail::scheduler::run(std::__1::error_code&)  (in server_hangs_so) + 181  [0x10d36bc25]
        +           906 asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::__1::error_code const&)  (in server_hangs_so) + 393  [0x10d36bfe9]
        +             906 kevent  (in libsystem_kernel.dylib) + 10  [0x7fff8f44d21a]
        906 Thread_1887610
          906 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff8d0983ed]
            906 _pthread_start  (in libsystem_pthread.dylib) + 176  [0x7fff8d09afd7]
              906 _pthread_body  (in libsystem_pthread.dylib) + 131  [0x7fff8d09b05a]
                906 void* std::__1::__thread_proxy<std::__1::tuple<main::$_3> >(void*)  (in server_hangs_so) + 124  [0x10d36324c]
                  906 asio::detail::scheduler::run(std::__1::error_code&)  (in server_hangs_so) + 181  [0x10d36bc25]
                    906 asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::__1::error_code const&)  (in server_hangs_so) + 263  [0x10d36bf67]
                      906 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff8f44c136]
    
    Total number in stack (recursive counted multiple, when >=5):
    
    Sort by top of stack, same collapsed (when >= 5):
            __psynch_cvwait  (in libsystem_kernel.dylib)        906
            __semwait_signal  (in libsystem_kernel.dylib)        906
            kevent  (in libsystem_kernel.dylib)        906
    

    只有在提供了额外的线程后,我才能通过以下结果完成测试:

    Benchmarking 127.0.0.1 (be patient)
    Completed 2000 requests
    Completed 4000 requests
    Completed 6000 requests
    Completed 8000 requests
    Completed 10000 requests
    Completed 12000 requests
    Completed 14000 requests
    Completed 16000 requests
    Completed 18000 requests
    Completed 20000 requests
    Finished 20000 requests
    
    
    Server Software:
    Server Hostname:        127.0.0.1
    Server Port:            9999
    
    Document Path:          /
    Document Length:        2 bytes
    
    Concurrency Level:      1
    Time taken for tests:   33.328 seconds
    Complete requests:      20000
    Failed requests:        3
       (Connect: 1, Receive: 1, Length: 1, Exceptions: 0)
    Total transferred:      419979 bytes
    HTML transferred:       39998 bytes
    Requests per second:    600.09 [#/sec] (mean)
    Time per request:       1.666 [ms] (mean)
    Time per request:       1.666 [ms] (mean, across all concurrent requests)
    Transfer rate:          12.31 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0  30.7      0    4346
    Processing:     0    1 184.4      0   26075
    Waiting:        0    0   0.0      0       1
    Total:          0    2 186.9      0   26075
    
    Percentage of the requests served within a certain time (ms)
      50%      0
      66%      0
      75%      0
      80%      0
      90%      0
      95%      0
      98%      0
      99%      0
     100%  26075 (longest request)
    

    可能卡住的线程的堆栈跟踪:

    * thread #3: tid = 0x0002, 0x00007fff8f44d21a libsystem_kernel.dylib`kevent + 10, stop reason = signal SIGSTOP
      * frame #0: 0x00007fff8f44d21a libsystem_kernel.dylib`kevent + 10
        frame #1: 0x0000000109c482ec server_hangs_so`asio::detail::kqueue_reactor::run(bool, asio::detail::op_queue<asio::detail::scheduler_operation>&) + 268
        frame #2: 0x0000000109c48039 server_hangs_so`asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::__1::error_code const&) + 393
        frame #3: 0x0000000109c47c75 server_hangs_so`asio::detail::scheduler::run(std::__1::error_code&) + 181
        frame #4: 0x0000000109c3f2fc server_hangs_so`void* std::__1::__thread_proxy<std::__1::tuple<main::$_3> >(void*) + 124
        frame #5: 0x00007fff8d09b05a libsystem_pthread.dylib`_pthread_body + 131
        frame #6: 0x00007fff8d09afd7 libsystem_pthread.dylib`_pthread_start + 176
        frame #7: 0x00007fff8d0983ed libsystem_pthread.dylib`thread_start + 13
    

    这可能是kqueue_reactor实施asio或mac系统本身(不太可能)的问题

    <强>更新 libevent也观察到相同的行为。因此,asio实施不是问题。它必须是kqueue内核实现中的一些错误。在Linux上epoll没有看到这个问题。

答案 1 :(得分:2)

你的本地套接字已经用完了。您不应该通过从单个IP地址生成所有负载来进行测试。 (另外,你的负载生成器应该足够聪明,能够检测并解决这种情况,但很多人都没有。)