异步/等待和多处理

时间:2015-02-27 11:12:57

标签: c# performance asynchronous dotnet-httpclient

我有50万个独特域名的txt文件,起初我只想打开每个网站。我正在使用异步HttpClient并尝试了3种不同的方法来分割任务:

1

IEnumerable<string> lines = File.ReadLines("file.txt");
try
{
    DataSet allData;
    var downloadData = new TransformBlock<string,byte[]>(
    async line =>
    {
        HttpClientHandler httpClientHandler = new HttpClientHandler();
        HttpClient client = new HttpClient(httpClientHandler);
        try
        {
            HttpResponseMessage responseMessage =
            await client.GetAsync(line).ConfigureAwait(false);
            return
            await responseMessage.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
        }
        catch (Exception ex)
        {
            //catch all to reduce code for testing
            return null;
        }
        finally
        {
            Interlocked.Increment(ref finishedUrls);
        }
    },
    new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = 500,
    });
    foreach (var line in lines)
    downloadData.Post(line);
    downloadData.Complete();
    await downloadData.Completion;

2

List<Task> allTasks = new List<Task>();
SemaphoreSlim throttler = new SemaphoreSlim(initialCount: DataflowBlockOptions.Unbounded);
foreach (var line in lines)
{

    await throttler.WaitAsync().ConfigureAwait(false);
    allTasks.Add(Task.Run(async () =>
    {

            try
            {
                HttpClientHandler httpClientHandler = new HttpClientHandler();
                HttpClient client = new HttpClient(httpClientHandler);
                try
                {
                    HttpResponseMessage responseMessage = await client.GetAsync(line).ConfigureAwait(false);
                    var byteArray = await responseMessage.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
                }
                catch (Exception ex)
                {
                }
                Interlocked.Increment(ref finishedUrls);
            }
            catch (Exception ex)
            {
            }
        }
        finally
        {
            throttler.Release();
        }
    }));
}
await Task.WhenAll(allTasks);

3

await lines.ForEachAsync(500,cancellationToken,async line =>
{
    HttpClientHandler httpClientHandler = new HttpClientHandler();
    HttpClient client = new HttpClient(httpClientHandler);
    try
    {
        HttpResponseMessage responseMessage = await client.GetAsync(line).ConfigureAwait(false);
        var byteArray = await responseMessage.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
    }
    catch (Exception ex)
    {
    }
    Interlocked.Increment(ref finishedUrls);
}
);

public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, CancellationToken cancellationToken,
            Func<T, Task> body)
        {
            return Task.WhenAll(
                from partition in Partitioner.Create(source).GetPartitions(dop).AsParallel()
                select Task.Run(async delegate
                {
                    using (partition)
                        while (partition.MoveNext())
                            await body(partition.Current).ConfigureAwait(false);
                }, cancellationToken));
        }

我从#3解决方案获得的最佳速度结果 - 大约12 000 url / min和性能监视器中建立的10000个连接 - cpu使用~1%

但是当我在每个文件中将txt文件拆分为5个部件= 10 000 000个网址并运行我的程序的5个实例时,摘要速度为25 000 urls / min并且建立了30 000个连接 - cpu使用3%。我正在玩越来越多的分区从500到更多,但它并没有带来很大的变化。所以我的问题是 - 如何运行一个可以处理25 000 url / min的程序实例?如何划分异步作业以获得尽可能高的速度?

进程有.NET限制吗?

该程序在64位Windows Server 2012上运行 500Mb网络,64GB RAM,SSD磁盘,E5-1620-v2 CPU

更新1 速度导致不同的“dop”和4个实例同时发生: http://pastebin.com/ab3UQPAC

3 个答案:

答案 0 :(得分:0)

删除外部任务可能有帮助吗?

关闭的东西(减去异常处理?)

List<Task> allTasks = new List<Task>();
foreach (var line in lines)
{
            HttpClientHandler httpClientHandler = new HttpClientHandler();
            HttpClient client = new HttpClient(httpClientHandler);
            try
            {
            allTasks.Add(client.GetAsync(line).
            ContinueWith(t => t.Result.Content.ReadAsByteArrayAsync(), TaskContinuationOptions.OnlyOnRanToCompletion));
            }
            catch
            {
            }
}
await Task.WhenAll(allTasks);

如果可以想象您通过让外部任务等待响应来消耗太多的ThreadPool资源?不确定调度程序如何处理这个问题,但外部任务对我来说似乎是多余的。

答案 1 :(得分:0)

尝试将System.Net.ServicePointManager.DefaultConnectionLimit设置为一个非常高的数字,例如int.MaxValue

答案 2 :(得分:-1)

我认为您的问题与此有关:Limit of outgoing connections for one process (.Net) 尝试将最大连接数增加到同时运行的任务数(可能是核心数)。