我有50万个独特域名的txt文件,起初我只想打开每个网站。我正在使用异步HttpClient并尝试了3种不同的方法来分割任务:
IEnumerable<string> lines = File.ReadLines("file.txt");
try
{
DataSet allData;
var downloadData = new TransformBlock<string,byte[]>(
async line =>
{
HttpClientHandler httpClientHandler = new HttpClientHandler();
HttpClient client = new HttpClient(httpClientHandler);
try
{
HttpResponseMessage responseMessage =
await client.GetAsync(line).ConfigureAwait(false);
return
await responseMessage.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
}
catch (Exception ex)
{
//catch all to reduce code for testing
return null;
}
finally
{
Interlocked.Increment(ref finishedUrls);
}
},
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 500,
});
foreach (var line in lines)
downloadData.Post(line);
downloadData.Complete();
await downloadData.Completion;
List<Task> allTasks = new List<Task>();
SemaphoreSlim throttler = new SemaphoreSlim(initialCount: DataflowBlockOptions.Unbounded);
foreach (var line in lines)
{
await throttler.WaitAsync().ConfigureAwait(false);
allTasks.Add(Task.Run(async () =>
{
try
{
HttpClientHandler httpClientHandler = new HttpClientHandler();
HttpClient client = new HttpClient(httpClientHandler);
try
{
HttpResponseMessage responseMessage = await client.GetAsync(line).ConfigureAwait(false);
var byteArray = await responseMessage.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
}
catch (Exception ex)
{
}
Interlocked.Increment(ref finishedUrls);
}
catch (Exception ex)
{
}
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
await lines.ForEachAsync(500,cancellationToken,async line =>
{
HttpClientHandler httpClientHandler = new HttpClientHandler();
HttpClient client = new HttpClient(httpClientHandler);
try
{
HttpResponseMessage responseMessage = await client.GetAsync(line).ConfigureAwait(false);
var byteArray = await responseMessage.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
}
catch (Exception ex)
{
}
Interlocked.Increment(ref finishedUrls);
}
);
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, CancellationToken cancellationToken,
Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop).AsParallel()
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current).ConfigureAwait(false);
}, cancellationToken));
}
我从#3解决方案获得的最佳速度结果 - 大约12 000 url / min和性能监视器中建立的10000个连接 - cpu使用~1%
但是当我在每个文件中将txt文件拆分为5个部件= 10 000 000个网址并运行我的程序的5个实例时,摘要速度为25 000 urls / min并且建立了30 000个连接 - cpu使用3%。我正在玩越来越多的分区从500到更多,但它并没有带来很大的变化。所以我的问题是 - 如何运行一个可以处理25 000 url / min的程序实例?如何划分异步作业以获得尽可能高的速度?
进程有.NET限制吗?
该程序在64位Windows Server 2012上运行 500Mb网络,64GB RAM,SSD磁盘,E5-1620-v2 CPU
更新1 速度导致不同的“dop”和4个实例同时发生: http://pastebin.com/ab3UQPAC
答案 0 :(得分:0)
删除外部任务可能有帮助吗?
关闭的东西(减去异常处理?)
List<Task> allTasks = new List<Task>();
foreach (var line in lines)
{
HttpClientHandler httpClientHandler = new HttpClientHandler();
HttpClient client = new HttpClient(httpClientHandler);
try
{
allTasks.Add(client.GetAsync(line).
ContinueWith(t => t.Result.Content.ReadAsByteArrayAsync(), TaskContinuationOptions.OnlyOnRanToCompletion));
}
catch
{
}
}
await Task.WhenAll(allTasks);
如果可以想象您通过让外部任务等待响应来消耗太多的ThreadPool资源?不确定调度程序如何处理这个问题,但外部任务对我来说似乎是多余的。
答案 1 :(得分:0)
尝试将System.Net.ServicePointManager.DefaultConnectionLimit
设置为一个非常高的数字,例如int.MaxValue
。
答案 2 :(得分:-1)
我认为您的问题与此有关:Limit of outgoing connections for one process (.Net) 尝试将最大连接数增加到同时运行的任务数(可能是核心数)。