Question

我有一个接收，处理和传输UDP数据包的应用程序。

如果接收和传输的端口号不同，一切正常。

如果端口号相同且IP地址不同，则当IP地址与运行应用程序的计算机位于同一子网时，通常可以正常工作。在最后一种情况下，send_to函数需要几秒钟才能完成，而不是通常的几毫秒。

Rx Port  Tx IP          Tx Port    Result

5001     Same           5002       OK  Delay ~ 0.001 secs
         subnet     

5001     Different      5001       OK  Delay ~ 0.001 secs
         subnet

5001     Same           5001       Fails  Delay > 2 secs
         subnet

这是一个演示问题的简短程序。

#include <ctime>
#include <iostream>
#include <string>
#include <boost/array.hpp>
#include <boost/asio.hpp>

using boost::asio::ip::udp;
using std::cout;
using std::endl;

int test( const std::string& output_IP)
{
    try
    {
        unsigned short prev_seq_no;

        boost::asio::io_service io_service;

        // build the input socket

        /* This is connected to a UDP client that is running continuously
        sending messages that include an incrementing sequence number
        */

        const int input_port = 5001;
        udp::socket input_socket(io_service, udp::endpoint(udp::v4(), input_port ));

        // build the output socket

        const std::string output_Port = "5001";
        udp::resolver resolver(io_service);
        udp::resolver::query query(udp::v4(), output_IP, output_Port );
        udp::endpoint output_endpoint = *resolver.resolve(query);
        udp::socket output_socket( io_service );
        output_socket.open(udp::v4());

       // double output buffer size
       boost::asio::socket_base::send_buffer_size option( 8192 * 2 );
       output_socket.set_option(option);

        cout  << "TX to " << output_endpoint.address() << ":"  << output_endpoint.port() << endl;



        int count = 0;
        for (;;)
        {
            // receive packet
            unsigned short recv_buf[ 20000 ];
            udp::endpoint remote_endpoint;
            boost::system::error_code error;
            int bytes_received = input_socket.receive_from(boost::asio::buffer(recv_buf,20000),
                                 remote_endpoint, 0, error);

            if (error && error != boost::asio::error::message_size)
                throw boost::system::system_error(error);

            // start timer
            __int64 TimeStart;
            QueryPerformanceCounter( (LARGE_INTEGER *)&TimeStart );

            // send onwards
            boost::system::error_code ignored_error;
            output_socket.send_to(
                boost::asio::buffer(recv_buf,bytes_received),
                output_endpoint, 0, ignored_error);

            // stop time and display tx time
            __int64 TimeEnd;
            QueryPerformanceCounter( (LARGE_INTEGER *)&TimeEnd );
            __int64 f;
            QueryPerformanceFrequency( (LARGE_INTEGER *)&f );
            cout << "Send time secs " << (double) ( TimeEnd - TimeStart ) / (double) f << endl;

            // stop after loops
            if( count++ > 10 )
                break;
        }
    }
    catch (std::exception& e)
    {
        std::cerr << e.what() << std::endl;
    }

}
int main(  )
{

    test( "193.168.1.200" );

    test( "192.168.1.200" );

    return 0;
}

在地址为192.168.1.101

的计算机上运行时，此程序的输出

TX to 193.168.1.200:5001
Send time secs 0.0232749
Send time secs 0.00541566
Send time secs 0.00924535
Send time secs 0.00449014
Send time secs 0.00616714
Send time secs 0.0199299
Send time secs 0.00746081
Send time secs 0.000157972
Send time secs 0.000246775
Send time secs 0.00775578
Send time secs 0.00477618
Send time secs 0.0187321
TX to 192.168.1.200:5001
Send time secs 1.39485
Send time secs 3.00026
Send time secs 3.00104
Send time secs 0.00025927
Send time secs 3.00163
Send time secs 2.99895
Send time secs 6.64908e-005
Send time secs 2.99864
Send time secs 2.98798
Send time secs 3.00001
Send time secs 3.00124
Send time secs 9.86207e-005

为什么会这样？有什么方法可以减少延迟吗？

注意：

使用code :: blocks构建，在各种Windows版本下运行
数据包长度为10000字节
如果我将运行应用程序的计算机连接到第二个网络，问题就会消失。例如WWLAN（蜂窝网络“火箭棒”）

据我所知，这就是我们的情况：

这有效（不同的端口，相同的LAN）：

这也有效（相同的端口，不同的LAN）：

这不起作用（相同的端口，相同的LAN）：

这似乎有效（相同的端口，相同的LAN，双宿主Module2主机）

Answer 1

鉴于在Windows上观察到大型数据报的目标地址与发送者位于同一子网内的不存在的对等体，问题可能是send()阻塞等待{{3}的结果响应，以便可以填充layer2以太网帧：

发送数据时，将使用路由中下一跳的媒体访问控制（MAC）地址填充layer2以太网帧。如果发送方不知道下一跳的MAC地址，它将广播ARP请求并缓存响应。使用发送方的子网掩码和目标地址，发送方可以确定下一跳是否与发送方位于同一子网上，或者数据是否必须通过默认网关路由。根据问题中的结果，在发送大数据报时：
- 数据报没有延迟，因为默认网关的MAC地址在发送方的ARP缓存中
- 发往发送方子网上不存在的对等方的数据报会导致等待ARP解析的延迟
套接字Address Resolution Protocol (ARP) （SO_SNDBUF）被设置为16384个字节，但发送的数据报大小为10000。当缓冲区饱和时，未指定send()的行为行为，但某些系统会观察到send()阻塞。在这种情况下，如果任何数据报发生延迟，例如等待ARP响应，饱和将很快发生。
```
// Datagrams being sent are 10000 bytes, but the socket buffer is 16384.
boost::asio::socket_base::send_buffer_size option(8192 * 2);
output_socket.set_option(option);
```
考虑让内核管理套接字缓冲区大小或根据预期的吞吐量增加它。
发送大小超过Window注册表FastSendDatagramThreshold‌参数的数据报时，send()调用可能会阻塞，直到数据报发送完毕。有关详细信息，请参阅send buffer size：

小于此参数值的数据报通过快速I / O路径或在发送时缓冲。保持较大的直到实际发送数据报。通过测试找到默认值是性能的最佳整体值。快速I / O意味着复制数据并绕过I / O子系统，而不是映射内存并通过I / O子系统。这对于少量数据是有利的。 通常不建议更改此值。

如果有人在发件人的子网上观察到每个send()到现有对等体的延迟，那么就会对网络进行分析和分析：

使用Microsoft TCP/IP Implementation Details衡量网络潜在吞吐量
使用iperf深入了解给定节点上发生的情况。查找ARP请求和响应。
从发件人的计算机上ping对等体，然后检查APR缓存。验证对等方是否存在缓存条目并且该条目是否正确。
尝试使用其他端口和/或TCP。这有助于确定网络策略是否限制或限制特定端口或协议的流量。

另请注意，在等待ARP解析时，快速连续发送低于FastSendDatagramThreshold值的数据报可能导致数据报被丢弃：

当该IP地址被解析为媒体访问控制地址时，ARP仅为指定的目标地址排队一个出站IP数据报。如果基于用户数据报协议（UDP）的应用程序将多个IP数据报发送到单个目标地址而它们之间没有任何暂停，则如果没有已存在的ARP缓存条目，则可能会丢弃某些数据报。在发送数据包流之前，应用程序可以通过调用iphlpapi.dll例程SendArp()来建立ARP缓存条目来补偿这一点。

Answer 2

优良作法是隔离Tx和Rx端口。我从CAsynchSocket派生自己的套接字类，因为它有一个消息泵，当你的套接字上收到数据并发送OnReceive函数时，它会发送一个系统消息（如果你覆盖底层虚函数，则为你的，如果你没有

发送UDP数据包的长时间延迟

2 个答案: