并行处理url集合并返回IEnumerable

时间:2013-02-07 09:30:28

标签: c# parallel-processing web-scraping system.reactive

我有一个用于抓取的网址集合,我想要并行下载资源,同时返回强类型结果的集合。

拥有WebClient.DownloadString()和“MyTypedResult Process(string s)

如何将其换行以进行string[] urls => IEnumerable<MyTypedResult>转换?

string[] urls = {"url1","url2","url3"};
List<MyTypedResult> ResultCollection = new List<MyTypedResult>();
foreach (var u in urls)
{
    WebClient wc = new WebClient();
    var content = wc.DownloadString(u);
    MyTypedResult r = Process(content);
    ResultCollection.Add(r);
}

我希望Web请求并行运行,但我需要在List中生成一个集合;

3 个答案:

答案 0 :(得分:4)

您可以使用.NET 4.5中的新玩具HttpClient来并行获得结果:

var httpClient = new HttpClient();

var tasks = urls.Select(url => httpClient.GetStringAsync(url)
                        .ContinueWith(task =>
                        {
                            string response = task.Result;
                            return ConvertToStrongType(response);
                        }));

 Task.WaitAll(tasks.ToArray());
 var results = tasks.Select(t => t.Result);

答案 1 :(得分:2)

这是带有HttpClient的Rx版本:

var urls = new[] { "url1", "url2", "url3" };
var client = new HttpClient();
var results = from url in urls.ToObservable()
              from content in client.GetStringAsync(url).ToObservable()
              select Process(content);
var enumerable = results.ToEnumerable();

答案 2 :(得分:1)

以下是代码,它使用Parallel.ForEach从url并行下载内容。 您需要使用ConcurrentList来确保并行填充集合而不会出现线程锁定问题。

void YourTask()
{
    string[] urls = {"url1","url2","url3"};
    ConcurrentList<MyTypedResult> ResultCollection = new ConcurrentList<MyTypedResult>();

    Parallel.ForEach(urls, url => 
    {
        GetData(url);
        ResultCollection.TryAdd(myTypedResult);
    );

    //on this line all parallel task will be completed and ResultCollection will be filled with the results

}

MyTypedResult GetData(string url)
{
   WebClient wc = new WebClient();
    var content = wc.DownloadString(url);
    MyTypedResult r = Process(content);
    return r;
}