Question

我想要的东西相当于

var docs = new LinkedList<string>();
for(int i = 0; ; ++i)
{
    string html = client.DownloadString($"http://someforum.com/page?id={i}"); 
    if(html == null)
       break;
    docs.AddLast(html);
}

除了可以利用client.DownloadString($"http://someforum.com/page?id={i}");是一个长期运行的任务，可以在不同的线程中运行的事实。

基本上，我尝试做的是从页面中获取HTML

"http://someforum.com/page?id=0"，"http://someforum.com/page?id=1"，...

除了我没有从id=m获取网页后，我认为任何尝试获取某些id=n网页n>m的任务都不会获得网页可以关闭。

Answer 1

您想要并行的程序，将IO调用作为主要方面，因此使用TaskCompletionSource的异步编程更好，因为DownloadAsync Webclient方法返回一个void。以下是ReadData的修改版本：

public Task<string> ReadData(int i)
{
    TaskCompletionSource<string> tcs = new TaskCompletionSource<string>();
    var client = new WebClient();
    string uriString = @"http://someforum.com/page?id=" + i;
    client.DownloadStringCompleted += (sender,args) =>
    {
         tcs.TrySetCanceled();
         tcs.TrySetException(args.Error);
         tcs.TrySetResult(args.Result);
    };

    client.DownloadStringAsync(new Uri(uriString));

    return tcs.Task;
}

异步调用ReadData

您最好通过async方法执行此操作，该方法可以await，直到所有下载调用都返回。此外，由于其多个Async调用最好设置i的限制，如同步版本，您无法检查每个下载和返回的值，在这种情况下所有调用都会一起处理

public async Task<LinkedList<string>> ReadDataAsync()
{
var docs = new LinkedList<string>();

List<Task<string>> taskList =  new List<Task<string>>();

for (int i = 0; ; ++i) // Set a limit to i, since you are not running synchronously, so you cannot keep checking which value yields null as result
{
    int localId = i;
    taskList.Add(ReadData(localId));
}

 await Task.WhenAll(taskList);

 // Do Link List processing, if the Task is not cancelled and doesn't have an error, then result can be accessed
}

Answer 2

您可以使用Parallel类而不是任务。像这样的东西

var docs = new LinkedList<string>();
var result = Parallel.For(0, N, n =>
{
    string html = new WebClient().DownloadString($"http://someforum.com/page?id={n}"); 
    if(html != null)
    {
        lock(docs)
        {
            docs.AddLast(html);
        }
    }
});

如何并行化此模式？

2 个答案: