发送并行请求,但每个主机只能通过HttpClient和Polly发送一个请求,以正常处理429个响应

时间:2019-07-13 20:53:29

标签: c# .net-core web-crawler tpl-dataflow polly

简介:

我正在构建一个单节点Web搜寻器,以简单地验证URL是否为.NET Core控制台应用程序中的200 OK。我使用HttpClient向不同的主机发送了一组URL。我对使用Polly和TPL数据流还很陌生。

要求:

  1. 我想支持同时发送多个HTTP请求和 可配置的MaxDegreeOfParallelism
  2. 我想将对任何给定主机的并行请求数限制为1(或可配置)。这是为了通过Polly策略妥善处理每个主机429 TooManyRequests的响应。或者,我可以使用断路器来取消对同一主机的并发请求,该请求是在收到一个429响应后,然后一次处理该特定主机吗?
  3. 我完全不使用TPL Dataflow是完全可以的,而赞成使用Polly Bulkhead或其他用于限制并行请求的机制,但是我不确定为了实现要求2该配置的外观。

当前实施:

我当前的实现工作正常,除了我经常看到对同一主机的x个并行请求大约在同一时间返回429……然后,它们都暂停了重试策略...然后,他们都在同一时间再次猛击同一台主机,但经常仍收到429。即使我在整个队列中平均分配了同一主机的多个实例,我的URL集合也会被一些特定的主机超载,这些主机最终仍会开始生成429

我认为在收到429之后,我只想向该主机发送一个并发请求,以尊重远程主机并追求200

验证器方法:

public async Task<int> GetValidCount(IEnumerable<Uri> urls, CancellationToken cancellationToken)
{
    var validator = new TransformBlock<Uri, bool>(
        async u => (await _httpClient.GetAsync(u, HttpCompletionOption.ResponseHeadersRead, cancellationToken)).IsSuccessStatusCode,
        new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = MaxDegreeOfParallelism}
    );
    foreach (var url in urls)
        await validator.SendAsync(url, cancellationToken);
    validator.Complete();
    var validUrlCount = 0;
    while (await validator.OutputAvailableAsync(cancellationToken))
    {
        if(await validator.ReceiveAsync(cancellationToken))
            validUrlCount++;
    }
    await validator.Completion;
    return validUrlCount;
}

应用于上面GetValidCount()中的HttpClient实例的Polly策略。

IAsyncPolicy<HttpResponseMessage> waitAndRetryTooManyRequests = Policy
    .HandleResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.TooManyRequests)
    .WaitAndRetryAsync(3,
        (retryCount, response, context) =>
            response.Result?.Headers.RetryAfter.Delta ?? TimeSpan.FromMilliseconds(120),
        async (response, timespan, retryCount, context) =>
        {
            // log stuff
        });

问题:

我如何修改或替换此解决方案以增加对要求2的满足?

2 个答案:

答案 0 :(得分:1)

我将尝试引入某种标志LimitedMode来检测此特定客户端是在受限模式下输入的。在下面,我声明了两种策略-一种简单的重试策略,仅用于捕获TooManyRequests并设置标志。第二个策略是现成的BulkHead策略。

    public void ConfigureServices(IServiceCollection services)
    {
        /* other configuration */

        var registry = services.AddPolicyRegistry();

        var catchPolicy = Policy.HandleResult<HttpResponseMessage>(r =>
            {
                LimitedMode = r.StatusCode == HttpStatusCode.TooManyRequests;
                return false;
            })
            .WaitAndRetryAsync(1, i => TimeSpan.FromSeconds(3)); 

        var bulkHead = Policy.BulkheadAsync<HttpResponseMessage>(1, 10, OnBulkheadRejectedAsync);

        registry.Add("catchPolicy", catchPolicy);
        registry.Add("bulkHead", bulkHead);

        services.AddHttpClient<CrapyWeatherApiClient>((client) =>
        {
            client.BaseAddress = new Uri("hosturl");
        }).AddPolicyHandlerFromRegistry(PolicySelector);
    }

然后,您可能想使用PolicySelector机制动态地决定要应用哪个策略:如果活动的受限模式-用catch 429策略包装批量头策略。如果收到成功状态代码,请切换回没有隔板的常规模式。

    private IAsyncPolicy<HttpResponseMessage> PolicySelector(IReadOnlyPolicyRegistry<string> registry, HttpRequestMessage request)
    {
        var catchPolicy = registry.Get<IAsyncPolicy<HttpResponseMessage>>("catchPolicy");
        var bulkHead = registry.Get<IAsyncPolicy<HttpResponseMessage>>("bulkHead");
        if (LimitedMode)
        {
            return catchPolicy.WrapAsync(bulkHead);
        }

        return catchPolicy;
    }        

答案 1 :(得分:1)

这是一种创建TransformBlock的方法,该方法可防止同时执行具有相同密钥的消息。通过调用提供的keySelector函数可获得每个消息的密钥。具有相同密钥的消息将彼此顺序处理(而不是并行处理)。该键也作为参数传递给transform函数,因为它在某些情况下很有用。

public static TransformBlock<TInput, TOutput>
    CreateExclusivePerKeyTransformBlock<TInput, TKey, TOutput>(
    Func<TInput, TKey, Task<TOutput>> transform,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    if (transform == null) throw new ArgumentNullException(nameof(transform));
    if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
    if (dataflowBlockOptions == null)
        throw new ArgumentNullException(nameof(dataflowBlockOptions));
    keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;

    var internalCTS = CancellationTokenSource
        .CreateLinkedTokenSource(dataflowBlockOptions.CancellationToken);

    var maxDOP = dataflowBlockOptions.MaxDegreeOfParallelism;
    var taskScheduler = dataflowBlockOptions.TaskScheduler;

    var maxDopSemaphore
        = new SemaphoreSlim(maxDOP == -1 ? Int32.MaxValue : maxDOP);

    var perKeySemaphores = new ConcurrentDictionary<TKey, SemaphoreSlim>(
        keyComparer);

    // The degree of parallelism is controlled by the semaphores
    dataflowBlockOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;

    // An exclusive scheduler is needed for preserving the processing order
    dataflowBlockOptions.TaskScheduler =
        new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler;

    var block = new TransformBlock<TInput, TOutput>(async item =>
    {
        var key = keySelector(item);
        var perKeySemaphore = perKeySemaphores
            .GetOrAdd(key, _ => new SemaphoreSlim(1));
        await perKeySemaphore.WaitAsync(internalCTS.Token).ConfigureAwait(false);
        try
        {
            await maxDopSemaphore.WaitAsync(internalCTS.Token)
                .ConfigureAwait(false);
            try
            {
                // Invoke the transform using the provided TaskScheduler
                return await Task.Factory.StartNew(() => transform(item, key),
                    internalCTS.Token, TaskCreationOptions.DenyChildAttach,
                    taskScheduler).Unwrap().ConfigureAwait(false);
            }
            catch (Exception ex) when (!(ex is OperationCanceledException))
            {
                internalCTS.Cancel(); // The block has failed
                throw;
            }
            finally
            {
                maxDopSemaphore.Release();
            }
        }
        finally
        {
            perKeySemaphore.Release();
        }
    }, dataflowBlockOptions);

    _ = block.Completion.ContinueWith(_ => internalCTS.Dispose(),
        TaskScheduler.Default);

    dataflowBlockOptions.MaxDegreeOfParallelism = maxDOP; // Restore initial value
    dataflowBlockOptions.TaskScheduler = taskScheduler; // Restore initial value
    return block;
}

用法示例:

var validator = CreateExclusivePerKeyTransformBlock<Uri, string, bool>(
    async (uri, host) =>
    {
        return (await _httpClient.GetAsync(uri, HttpCompletionOption
            .ResponseHeadersRead, token)).IsSuccessStatusCode;
    },
    new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = 30,
        CancellationToken = token,
    },
    keySelector: uri => uri.Host,
    keyComparer: StringComparer.OrdinalIgnoreCase);

所有execution options受支持(MaxDegreeOfParallelismBoundedCapacityCancellationTokenEnsureOrdered等)。

下面是CreateExclusivePerKeyTransformBlock的重载,它接受同步委托,而另一个方法+重载返回的是ActionBlock而不是TransformBlock,具有相同的行为。

public static TransformBlock<TInput, TOutput>
    CreateExclusivePerKeyTransformBlock<TInput, TKey, TOutput>(
    Func<TInput, TKey, TOutput> transform,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    return CreateExclusivePerKeyTransformBlock(
        (item, key) => Task.FromResult(transform(item, key)),
        dataflowBlockOptions, keySelector, keyComparer);
}

// An ITargetBlock is similar to an ActionBlock
public static ITargetBlock<TInput>
    CreateExclusivePerKeyActionBlock<TInput, TKey>(
    Func<TInput, TKey, Task> action,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    var block = CreateExclusivePerKeyTransformBlock(async (item, key) =>
        { await action(item, key).ConfigureAwait(false); return (object)null; },
        dataflowBlockOptions, keySelector, keyComparer);
    block.LinkTo(DataflowBlock.NullTarget<object>());
    return block;
}

public static ITargetBlock<TInput>
    CreateExclusivePerKeyActionBlock<TInput, TKey>(
    Action<TInput, TKey> action,
    ExecutionDataflowBlockOptions dataflowBlockOptions,
    Func<TInput, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    return CreateExclusivePerKeyActionBlock(
        (item, key) => { action(item, key); return Task.CompletedTask; },
        dataflowBlockOptions, keySelector, keyComparer);
}