Question

我在Azure上有一个C＃SignalR客户端（2.2）和ASP.NET MVC SignalR服务器。当在服务器端创建新的“实体”时，它会使用以下内容向客户端发送简单通知：

public static class EntityHubHelper
{
    private static readonly IHubContext _hubContext = GlobalHost.ConnectionManager.GetHubContext<EntityHub>();

    public static void EntityCreated(IdentityUser user, Entity entity)
    {
        _hubContext.Clients.User(user.UserName).EntityCreated(entity);
    }
}

[Authorize]
public class EntityHub : Hub
{
    // Just tracing overrides for OnConnected/OnReconnected/OnDisconnected
}

有时客户端或服务器会重新连接，这是预期的，但我看到两者都重新连接（例如重新启动Web服务器），但客户端停止获取数据的情况。

这似乎发生在没有数据被推送1-2天之后，然后最终导致错过推送。

我们的客户追踪：

15/08/02 03:57:23 DEBUG SignalR: StateChanged: Connected -> Reconnecting
15/08/02 03:57:28 DEBUG SignalR: Error: System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server ---> System.Net.WebException: The remote server returned an error: (500) Internal Server Error.
15/08/02 03:57:31 DEBUG SignalR: Error: System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server ---> System.Net.WebException: The remote server returned an error: (500) Internal Server Error.
15/08/02 03:57:47 DEBUG SignalR: StateChanged: Reconnecting -> Connected
15/08/02 03:57:47 INFO SignalR OnReconnected

我们的服务器跟踪：

8/2/2015 3:57:57 AM     [SignalR][OnReconnected] Email=correspondinguser@example.com, ConnectionId=ff4e472b-184c-49d4-a662-8b0e26da43e2

我正在使用keepalive和timeout（10s和30s）的服务器默认值，它通常使用websockets（在Azure上启用，标准没有限制）。

我有两个问题：

（1）客户端如何发现服务器已在websocket案例中重新启动（在这种情况下，它会丢失所述客户端存在的内存）？服务器的10s / 30s设置是否在初始连接期间被推下，客户端决定服务器在30s后消失了？

（2）如何调试此情况？有没有办法证明客户端实际上仍在接收Keepalive，所以我知道我在其他地方遇到了一些灾难性的问题？

Answer 1

经过各种测试和修复后，从用户映射到连接ID时，问题出现在 IUserIdProvider 中。使用SignalR消息添加客户端生成的Keepalive表明客户端和服务器确实已经重新连接，并且连接保持健康，但是从服务器推送到客户端的消息在1-2天后进入黑洞，可能是网站发布/ appdomain刷新参与。

我使用@davidfowl various options explained here推荐的this user presence sample将 IUserIdProvider 替换为SQL Azure（in this post），并根据我现有的用户/身份验证方案进行了调整。但是，它需要在 PresenceMonitor.cs 中进行一些额外的更改以提高可靠性：

我必须将periodsBeforeConsideringZombie从3增加到6，因为它正在移除＆＃34; zombie＆＃34; 30秒左右的连接，当它们不会断开连接到50s左右时。这意味着连接有时会在30-50s范围内重新连接，而不会在数据库中进行跟踪。
我必须修复在数据库中找不到的心跳跟踪连接的处理。

示例在UserPresence.Check()中包含以下代码：

// Update the client's last activity
if (connection != null)
{
    connection.LastActivity = DateTimeOffset.UtcNow;
}
else
{
    // We have a connection that isn't tracked in our DB!
    // This should *NEVER* happen
    // Debugger.Launch();
}

然而，显然应该永远不会发生的情况 - 看到数据库中没有找到心跳跟踪的连接 - 有点常见（比如10％的新连接），即使{6}有{6}这是因为集线器的OnConnected事件有时可能会有点慢，所以如果您的10秒计时器处理程序是＆＃34;幸运＆＃34，您将在心跳列表中看到新连接;

我在periodsBeforeConsideringZombie中使用此代码代替连接两个计时器滴答，或者在10秒到20秒之间取决于计时器＆＃34;运气＆＃34;，以触发OnConnected。如果它仍未进行数据库跟踪，我将其断开连接，以便客户端再次连接（处理OnClosed）并且不会出现消息黑洞（因为我为用户循环数据库连接以推送消息）。

UserPresence

它为单个服务器完成工作，但可能需要将HashSet移动到数据库以处理横向扩展。

毕竟，一切都非常可靠，我的服务器推送代码仍然非常简单：

private HashSet<string> notInDbReadyToDisconnect = new HashSet<string>();

private void Check()
{
    HashSet<string> notInDbReadyToDisconnectNew = new HashSet<string>();

    ...

        else
        {
            // REMOVED: // We have a connection that isn't tracked in our DB!
            // REMOVED: // This should *NEVER* happen
            // REMOVED: // Debugger.Launch();
            string format;
            if (notInDbReadyToDisconnect.Contains(trackedConnection.ConnectionId))
            {
                trackedConnection.Disconnect();
                format = "[SignalR][PresenceMonitor] Disconnecting active connection not tracked in DB (#2), ConnectionId={0}";
            }
            else
            {
                notInDbReadyToDisconnectNew.Add(trackedConnection.ConnectionId);
                format = "[SignalR][PresenceMonitor] Found active connection not tracked in DB (#1), ConnectionId={0}";
            }
        }

    ...


    notInDbReadyToDisconnect = notInDbReadyToDisconnectNew;

    ...
}

SignalR：客户端和服务器都“重新连接”但推送不到达客户端

1 个答案: