我有100个网址列表。我需要获取这些网址的html内容。假设我不使用DownloadString
的异步版本,而是执行以下操作。
var task1 = SyTask.Factory.StartNew(() => new WebClient().DownloadString("url1"));
我想要达到的目标是一次获取最多4个网址的html字符串。
我为前四个网址启动了4个任务。假设第二个网址完成,我想立即启动第五个网址的第5个任务。等等。这种方式最多只能下载4个网址,并且出于所有目的,总会有4个网址被下载,即直到所有100个网址都被处理完毕。
我似乎无法想象我将如何实现这一目标。这样做必须有既定的模式。思考?
修改
跟进@ Damien_The_Unbeliever使用Parallel.ForEach
的评论,我写了以下内容
var urls = new List<string>();
var results = new Dictionary<string, string>();
var lockObj = new object();
Parallel.ForEach(urls,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
url =>
{
var str = new WebClient().DownloadString(url);
lock (lockObj)
{
results[url] = str;
}
});
我认为上述内容比创建单个任务和使用信号量限制并发更好。那说从未使用过Parallel.ForEach
,我不确定这是否正确地做了我需要做的事情。
答案 0 :(得分:8)
SemaphoreSlim sem = new SemaphoreSlim(4);
foreach (var url in urls)
{
sem.Wait();
Task.Factory.StartNew(() => new WebClient().DownloadString(url))
.ContinueWith(t => sem.Release());
}
答案 1 :(得分:1)
实际上,Task.WaitAny
比ContinueWith
int tasksPerformedCount = 0
Task[] tasks = //initial 4 tasks
while(tasksPerformedCount< 100)
{
//returns the index of the first task to complete, as soon as it completes
int index = Task.WaitAny(tasks);
tasksPerformedCount++;
//replace it with a new one
tasks[index] = //new task
}
的Task.WaitAny
的另一个示例
namespace Chapter1 {
public static class Program {
public static void Main() {
Task<int>[] tasks = new Task<int>[3];
tasks[0] = Task.Run(() => { Thread.Sleep(2000); return 1; });
tasks[1] = Task.Run(() => { Thread.Sleep(1000); return 2; });
tasks[2] = Task.Run(() => { Thread.Sleep(3000); return 3; });
while (tasks.Length > 0)
{
int i = Task.WaitAny(tasks);
Task<int> completedTask = tasks[i];
Console.WriteLine(completedTask.Result);
var temp = tasks.ToList();
temp.RemoveAt(i);
tasks = temp.ToArray();
}
}
}
}