Sql Server瞬态异常数

时间:2016-02-25 14:58:18

标签: c# sql-server

我想为我的数据库调用编写一些包装代码(使用C#和Microsoft技术访问数据库),自动重试“瞬态”异常。通过瞬态,我的意思是有一个很好的机会最终会解决(对于逻辑错误,永远不会工作)。我能想到的例子包括:

  • 死锁
  • 连接超时
  • 命令超时

我曾计划使用SqlException的错误号来发现这些错误。例如:

List<RunStoredProcedureResultType> resultSet = null;
int limit = 3;
for (int i = 0; i < limit; ++i)
{
    bool isLast = i == limit - 1;
    try
    {
        using (var db = /* ... */)
        {
            resultSet = db.RunStoredProcedure(param1, param2).ToList();
        }
        //if it gets here it was successful
        break;
    }
    catch (SqlException ex)
    {
        if (isLast)
        {
            //3 transient errors in a row. So just kill it
            throw;
        }
        switch (ex.Number)
        {
            case 1205: //deadlock
            case -2:   //timeout (command timeout?)
            case 11:   //timeout (connection timeout?)
                // do nothing - continue the loop
                break;
            default:
                //a non-transient error. Just throw the exception on
                throw;
        }
    }
    Thread.Sleep(TimeSpan.FromSeconds(1)); //some kind of delay - might not use Sleep
}
return resultSet;

(请原谅我的任何错误 - 我刚刚写了那篇文章。我也意识到我可以很好地把它包起来......)

所以关键问题是:我应该考虑什么数字'瞬态'(我意识到我认为瞬态可能与其他人认为瞬态不同)。我在这里找到了一个很好的清单:

https://msdn.microsoft.com/en-us/library/cc645603.aspx

但其庞大而且非常有用。 是否还有其他人建立了他们用于类似内容的列表?

更新

最后,我们选择了一个“错误列表” - 如果错误是已知的“非暂时性错误”列表中的一个 - 通常是程序员错误。我已经列出了我们正在使用的数字列表作为答案。

4 个答案:

答案 0 :(得分:7)

在sql Azure中有一个类[SqlDatabaseTransientErrorDetectionStrategy.cs]用于临时故障处理。它涵盖了几乎所有类型的异常代码,可以视为瞬态代码。它也是Retry strategy的完整实现。

在此处添加代码段以供将来参考:

/// <summary>
/// Error codes reported by the DBNETLIB module.
/// </summary>
private enum ProcessNetLibErrorCode
{
    ZeroBytes = -3,

    Timeout = -2,
    /* Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. */

    Unknown = -1,

    InsufficientMemory = 1,

    AccessDenied = 2,

    ConnectionBusy = 3,

    ConnectionBroken = 4,

    ConnectionLimit = 5,

    ServerNotFound = 6,

    NetworkNotFound = 7,

    InsufficientResources = 8,

    NetworkBusy = 9,

    NetworkAccessDenied = 10,

    GeneralError = 11,

    IncorrectMode = 12,

    NameNotFound = 13,

    InvalidConnection = 14,

    ReadWriteError = 15,

    TooManyHandles = 16,

    ServerError = 17,

    SSLError = 18,

    EncryptionError = 19,

    EncryptionNotSupported = 20
}

还有一个切换案例,用于检查sql异常中返回的错误号是否为

switch (err.Number)
{
    // SQL Error Code: 40501
    // The service is currently busy. Retry the request after 10 seconds. Code: (reason code to be decoded).
    case ThrottlingCondition.ThrottlingErrorNumber:
        // Decode the reason code from the error message to determine the grounds for throttling.
        var condition = ThrottlingCondition.FromError(err);

        // Attach the decoded values as additional attributes to the original SQL exception.
        sqlException.Data[condition.ThrottlingMode.GetType().Name] =
            condition.ThrottlingMode.ToString();
        sqlException.Data[condition.GetType().Name] = condition;

        return true;

    // SQL Error Code: 10928
    // Resource ID: %d. The %s limit for the database is %d and has been reached.
    case 10928:
    // SQL Error Code: 10929
    // Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. 
    // However, the server is currently too busy to support requests greater than %d for this database.
    case 10929:
    // SQL Error Code: 10053
    // A transport-level error has occurred when receiving results from the server.
    // An established connection was aborted by the software in your host machine.
    case 10053:
    // SQL Error Code: 10054
    // A transport-level error has occurred when sending the request to the server. 
    // (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    case 10054:
    // SQL Error Code: 10060
    // A network-related or instance-specific error occurred while establishing a connection to SQL Server. 
    // The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server 
    // is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed 
    // because the connected party did not properly respond after a period of time, or established connection failed 
    // because connected host has failed to respond.)"}
    case 10060:
    // SQL Error Code: 40197
    // The service has encountered an error processing your request. Please try again.
    case 40197:
    // SQL Error Code: 40540
    // The service has encountered an error processing your request. Please try again.
    case 40540:
    // SQL Error Code: 40613
    // Database XXXX on server YYYY is not currently available. Please retry the connection later. If the problem persists, contact customer 
    // support, and provide them the session tracing ID of ZZZZZ.
    case 40613:
    // SQL Error Code: 40143
    // The service has encountered an error processing your request. Please try again.
    case 40143:
    // SQL Error Code: 233
    // The client was unable to establish a connection because of an error during connection initialization process before login. 
    // Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy 
    // to accept new connections; or there was a resource limitation (insufficient memory or maximum allowed connections) on the server. 
    // (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
    case 233:
    // SQL Error Code: 64
    // A connection was successfully established with the server, but then an error occurred during the login process. 
    // (provider: TCP Provider, error: 0 - The specified network name is no longer available.) 
    case 64:
    // DBNETLIB Error Code: 20
    // The instance of SQL Server you attempted to connect to does not support encryption.
    case (int)ProcessNetLibErrorCode.EncryptionNotSupported:
        return true;
}

请参阅完整的source here

答案 1 :(得分:4)

很抱歉回答我自己的问题,但如果有人仍然感兴趣,我们刚刚开始构建我们自己的错误代码列表。不理想,但我们认为这不应该经常发生。

我们选择了一个不错的列表&#39;方法,而不是好的清单&#39;正如问题所暗示的那样。我们到目前为止的ID是:

PARAMETER_NOT_SUPPLIED = 201;
CANNOT_INSERT_NULL_INTO_NON_NULL = 515;
FOREGIN_KEY_VIOLATION = 547;
PRIMARY_KEY_VIOLATION = 2627;
MEMORY_ALLOCATION_FAILED = 4846;
ERROR_CONVERTING_NUMERIC_TO_DECIMAL = 8114; 
TOO_MANY_ARGUMENTS = 8144;
ARGUMENT_IS_NOT_A_PARAMETER = 8145;
ARGS_SUPPLIED_FOR_PROCEDURE_WITHOUT_PARAMETERS = 8146;
STRING_OR_BINARY_TRUNCATED = 8152;
INVALID_POINTER = 10006;
WRONG_NUMBER_OF_PARAMETERS = 18751;

我们注意到的另一件事是,如果连接池超时,则不会收到SqlException - 而是会收到InvalidOperationException报告&#34; Timeout已过期&#34;。遗憾的是它不是SqlException,而是非常值得捕捉。

我会尝试更新任何内容。

答案 2 :(得分:2)

没有可重试代码的规范列表。其他球队之前遇到过这个问题。 EF团队制定了重试策略。您可能想要搜查他们的代码。但列表并不完整。我已经看到EF在GitHub上提交了修改列表的地方。

我也有这个问题。我添加了一些我从SELECT * FROM sys.messages WHERE language_id = 1033 AND text LIKE '%...%'挖出的明显错误代码。然后,我在应用程序遇到它们时添加了代码。

您还需要针对超时和网络错误重试特殊错误号。服务器无法生成该号码,因为连接已断开。我认为这个数字是-2,但你需要确定。

SQL Server定义的错误级别对于此目的是无用的(并且通常是大多数)。

答案 3 :(得分:1)

我们正在使用vendettamit的列表,并在遇到偶然发现的其他错误代码时继续对其进行扩展。重要的是要注意,对瞬态的定义是“值得重试,也许下次可以使用”,而不是不必要的SQL Server问题。到目前为止,我们已经添加了以下代码:

  • 53:建立与服务器的连接时发生错误。
  • 109:管道已结束(仅当您通过命名管道连接到本地SQL Server时才需要)
  • 11004:无法连接(通常意味着客户端网络尚未准备就绪)
  • 17142:服务器已暂停(当SQL服务器“消失”一会儿时很方便进行测试)

当您知道连接字符串正确时(例如,由于您最近才设法建立了连接),则可以添加以下代码。同样,您应该不要使用它们来重试建立初始连接,如果您的连接字符串中有错误的参数,这些也可能会弹出。

  • 11001:找不到主机(可以在网络更改中出现)
  • 1326:用户名或密码错误(在VM上拔出虚拟以太网电缆时,我们已经看到此弹出窗口)
  • 258:找不到服务器(没有TCP答复)