提升ASIO线程性能问题

时间:2013-06-09 14:11:09

标签: c++ multithreading performance caching boost-asio

我正在使用gcc 4.7在ubuntu下工作。我正在尝试建立一个网络库来通过udp发送数据。

我使用boost :: asio :: io_service,调用io_service :: run()来设置线程池。我使用async_receive_from等待数据而不阻塞线程。一旦读取了一些数据,就会启动一个新的async_read,接收到的数据将由接收它的线程传递给我的堆栈。我正在为我收到数据的每个地址创建一个单独的堆栈,并使用tbb :: concurrent_hash_map锁定每个堆栈,同时向其传递消息。您可以在下面看到相关的代码。

现在我的问题是,当我在线程池中只运行一个线程时,我收到大部分数据包,但是当我在池中运行2个线程时,我丢弃了更多的数据包。由此我正在阅读线程问题,但我无法确切地知道我的问题是什么。我的探查器中没有显示任何内容(valgrind --tool = callgrind)。有没有人有任何想法?

public :    
    /* Virtual functions for sending and receiving messages */
    virtual void received(const boost::asio::ip::address &addr, msg_data *const data, const msg_header *const header) override
    {
        /* Pend the next read */
        start_receiving();

        /* Check if there exists a parallel stack for this address, if not create it */
        typename stack_map::accessor stack_acc;
        if (_strands.insert(stack_acc, addr.to_string()))
        {
            std::cout << "cloning stack: " << addr.to_string() << std::endl;
            stack_acc->second = this->_up_node->clean_clone();
        }

        /* While holding the lock on this stack, propagate the message */
        stack_acc->second->received(addr, data, header);
    }

    /* Start waiting for data on the socket */
    virtual void start_receiving() override
    {
        /* Prepare receive buffer */
        char *data_buf = new char [MAX_UDP_SIZE];
        std::array<boost::asio::mutable_buffer, 2> recv_buf =
        {{
            boost::asio::buffer(_head_buf, HEADER_SIZE),
            boost::asio::buffer(data_buf, MAX_UDP_SIZE)
        }};

        /* Wait for data */
        _recv_socket.async_receive_from(recv_buf, _recv_endpoint,
            [&,data_buf](const boost::system::error_code &, size_t bytes_transferred)
            {
                received(_recv_endpoint.address(), new msg_data(data_buf, bytes_transferred - HEADER_SIZE), new msg_header(_head_buf));
            });
    }
private :
    typedef tbb::concurrent_hash_map<std::string, stack_component*> stack_map;
    stack_map _strands;

修改 所以我将我的代码改写为下面的代码并且它的性能不再受线程数量的影响。我猜这个问题与跨多个处理器缓存同步数据有关,但我没有任何证据。有没有人有更好的猜测或某种方式我可以收集证据(最好是自由的方式:))?

    /* Start waiting for data on the socket */
    virtual void start_receiving() override
    {
        while (true)
        {
            /* Prepare receive buffer */
            char *data_buf = new char [MAX_UDP_SIZE];
            char *head_buf = new char [HEADER_SIZE];
            std::array<boost::asio::mutable_buffer, 2> recv_buf =
            {{
                boost::asio::buffer(head_buf, HEADER_SIZE),
                boost::asio::buffer(data_buf, MAX_UDP_SIZE)
            }};

            /* Wait for data */
            size_t bytes_transferred = _recv_socket.receive_from(recv_buf, _recv_endpoint);

            _io_service.post([&,head_buf,data_buf,bytes_transferred]()
                {
                    msg_header *header = new msg_header(head_buf);
                    delete [] head_buf;
                    received(_recv_endpoint.address(), new msg_data(data_buf, bytes_transferred - HEADER_SIZE), header);
                });
        }
    }

    /* Virtual functions for sending and receiving messages */
    virtual void received(const boost::asio::ip::address &addr, msg_data *const data, const msg_header *const header) override
    {
        /* Check if there exists a parallel stack for this address, if not create it */
        typename stack_map::accessor stack_acc;
        if (_strands.insert(stack_acc, addr.to_string()))
        {
            stack_acc->second = this->_up_node->clean_clone();
        }

        /* While holding the lock on this stack, propagate the message */
        stack_acc->second->received(addr, data, header);
    }

0 个答案:

没有答案