我正在编写一个连接到远程postgres服务器的服务。 我正在寻找一种好方法来确定应将哪些异常视为临时(值得重试),以及如何定义适当的策略以连接到远程数据库。
该服务使用Npgsql进行数据访问。 该文档说Npgsql将针对sql错误抛出PostgresException,并针对"服务器相关问题抛出NpgsqlException"。
到目前为止,我能够想到的最好的方法是假设所有不是PostgresExceptions的异常应该被视为可能是暂时的,值得重试,但PostgresException意味着查询有问题并且重试没有用。我在这个假设中是否正确?
我正在使用Polly来创建重试和断路器策略。 因此,我的政策如下:
Policy.Handle<Exception>( AllButPotgresExceptions()) // if its a postgres exception we know its not going to work even with a retry, so don't
.WaitAndRetryAsync(new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4)
}, onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( AllButPotgresExceptions())
.AdvancedCircuitBreakerAsync(
failureThreshold:.7,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static Func<Exception, bool> AllButPotgresExceptions()
{
return ex => ex.GetType() != typeof(PostgresException);
}
有没有更好的方法来确定哪些错误可能是暂时的?
更新:
根据Shay的建议,我在Npgsql中打开了一个新问题,并将我的策略更新为:
public static Policy PostresTransientFaultPolicy
{
get
{
return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.WaitAndRetryAsync(
retryCount: 10,
sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4),
onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.AdvancedCircuitBreakerAsync(
failureThreshold:.4,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
{
//TODO add random %20 variance on the exponent
return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
}
private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
{
return (ex) =>
{
//if it is not a postgres exception we must assume it will be transient
if (ex.GetType() != typeof(PostgresException))
return true;
var pgex = ex as PostgresException;
switch (pgex.SqlState)
{
case "53000": //insufficient_resources
case "53100": //disk_full
case "53200": //out_of_memory
case "53300": //too_many_connections
case "53400": //configuration_limit_exceeded
case "57P03": //cannot_connect_now
case "58000": //system_error
case "58030": //io_error
//These next few I am not sure whether they should be treated as transient or not, but I am guessing so
case "55P03": //lock_not_available
case "55006": //object_in_use
case "55000": //object_not_in_prerequisite_state
case "08000": //connection_exception
case "08003": //connection_does_not_exist
case "08006": //connection_failure
case "08001": //sqlclient_unable_to_establish_sqlconnection
case "08004": //sqlserver_rejected_establishment_of_sqlconnection
case "08007": //transaction_resolution_unknown
return true;
}
return false;
};
}
答案 0 :(得分:1)
你的方法很好。 NpgsqlException通常表示网络/ IO错误,但您可以检查内部异常并检查IOException以确定。
当PostgreSQL报告错误时抛出PostgresException,这在大多数情况下是查询的问题。但是,可能存在一些短暂的服务器端问题(例如,连接太多),您可以检查SQL错误代码 - 请参阅the PG docs。
向这些异常添加IsTransient
属性可能是一个好主意,在PostgreSQL本身内编码这些检查 - 欢迎您在Npgsql repo上为此打开一个问题。