我在使用C#编写的托管Windows服务中工作。它不断接收来自通过TCP / IP连接的多个客户端的消息。客户端基本上是一个路由器,用于接收和重新发送从温度计到服务器的消息。服务器解析消息并将它们存储在SQL Server数据库中。
我面临的问题是,有些客户突然停止发送消息。但是,一旦服务重新启动,它们将再次连接并继续发送。我没有客户端的代码,因为它是第三方设备,我很确定问题出在服务器上。
我设法通过实现一个持续检查每个客户端是否仍然连接的计时器来减少问题(参见下面的代码)。另外,我使用socket.IOControl(IOControlCode.KeepAliveValues, ...)
方法向Socket添加了一个Keep Alive模式,但问题仍然存在。
我发布了一些我认为相关的特定部分的代码。但是,如果需要更多的片段来理解这个问题,请问我,我会编辑帖子。删除了所有try / catch块以减少代码数量。
我不想要一个完美的解决方案,任何指导都会受到赞赏。
private Socket _listener;
private ConcurrentDictionary<int, ConnectionState> _connections;
public TcpServer(TcpServiceProvider provider, int port)
{
this._provider = provider;
this._port = port;
this._listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
this._connections = new ConcurrentDictionary<int, ConnectionState>();
ConnectionReady = new AsyncCallback(ConnectionReady_Handler);
AcceptConnection = new WaitCallback(AcceptConnection_Handler);
ReceivedDataReady = new AsyncCallback(ReceivedDataReady_Handler);
}
public bool Start()
{
this._listener.Bind(new IPEndPoint(IPAddress.Any, this._port));
this._listener.Listen(10000);
this._listener.BeginAccept(ConnectionReady, null);
}
// Check every 5 minutes for clients that have not send any message in the past 30 minutes
// MSG_RESTART is a command that the devices accepts to restart
private void CheckForBrokenConnections()
{
foreach (var entry in this._connections)
{
ConnectionState conn = entry.Value;
if (conn.ReconnectAttemptCount > 3)
{
DropConnection(conn);
continue;
}
if (!conn.Connected || (DateTime.Now - conn.LastResponse).TotalMinutes > 30)
{
byte[] message = HexStringToByteArray(MSG_RESTART);
if (!conn.WaitingToRestart && conn.Write(message, 0, message.Length))
{
conn.WaitingToRestart = true;
}
else
{
DropConnection(conn);
}
}
}
}
private void ConnectionReady_Handler(IAsyncResult ar)
{
lock (thisLock)
{
if (this._listener == null)
return;
ConnectionState connectionState = new ConnectionState();
connectionState.Connection = this._listener.EndAccept(ar);
connectionState.Server = this;
connectionState.Provider = (TcpServiceProvider)this._provider.Clone();
connectionState.Buffer = new byte[4];
Util.SetKeepAlive(connectionState.Connection, KEEP_ALIVE_TIME, KEEP_ALIVE_TIME);
int newID = (this._connections.Count == 0 ? 0 : this._connections.Max(x => x.Key)) + 1;
connectionState.ID = newID;
this._connections.TryAdd(newID, connectionState);
ThreadPool.QueueUserWorkItem(AcceptConnection, connectionState);
this._listener.BeginAccept(ConnectionReady, null);
}
}
private void AcceptConnection_Handler(object state)
{
ConnectionState st = state as ConnectionState;
st.Provider.OnAcceptConnection(st);
if (st.Connection.Connected)
st.Connection.BeginReceive(st.Buffer, 0, 0, SocketFlags.None, ReceivedDataReady, st);
}
private void ReceivedDataReady_Handler(IAsyncResult result)
{
ConnectionState connectionState = null;
lock (thisLock)
{
connectionState = result.AsyncState as ConnectionState;
connectionState.Connection.EndReceive(result);
if (connectionState.Connection.Available == 0)
return;
// Here the message is parsed
connectionState.Provider.OnReceiveData(connectionState);
if (connectionState.Connection.Connected)
connectionState.Connection.BeginReceive(connectionState.Buffer, 0, 0, SocketFlags.None, ReceivedDataReady, connectionState);
}
}
internal void DropConnection(ConnectionState connectionState)
{
lock (thisLock)
{
if (this._connections.Values.Contains(connectionState))
{
ConnectionState conn;
this._connections.TryRemove(connectionState.ID, out conn);
}
if (connectionState.Connection != null && connectionState.Connection.Connected)
{
connectionState.Connection.Shutdown(SocketShutdown.Both);
connectionState.Connection.Close();
}
}
}
答案 0 :(得分:2)
我认为的两件事我明白了......
如果这是您为多条消息保留的连接,那么当ReceivedDataReady_Handler
IIRC可以接收到0长度的数据包时,您可能不应该从connectionState.Connection.Available == 0
返回。因此,如果连接仍处于打开状态,则应在离开处理程序之前调用connectionState.Connection.BeginReceive( ... )
。
(我在此处犹豫不决,因为我不记得具体内容)您可以处理的事件会告诉您底层连接发生的事情,包括连接或关闭连接时的错误和失败。对于我的生活,我记不起名字......这可能比每隔几秒钟一次计时器更有效率。它还为您提供了一种打破连接或关闭状态下连接的方法。
答案 1 :(得分:1)
在所有IO调用周围添加try / catch块,并将错误写入日志文件。实际上,它无法在出错时恢复。
另外,请注意没有超时的任何锁。这些操作应该给出合理的TTL。
答案 2 :(得分:1)
我多次经历过这种情况。问题可能不在于您的代码,而在于网络和Windows(两端)或路由器处理网络的方式。经常发生的事情是临时网络中断“中断”套接字,但Windows不知道它,因此它不会关闭套接字。
解决这个问题的唯一方法就是你所做的 - 发送保持活动并监控连接健康状况。一旦您意识到连接已关闭,您需要重新启动它。但是,在您的代码中,您不会重新启动也会损坏并且无法接受新连接的侦听器套接字。这就是为什么重新启动服务会有所帮助,它会重新启动监听器。