使用DownloadFileTaskAsync一次下载所有文件

时间:2015-09-04 05:34:13

标签: c# task webclient

给定包含Urls的输入文本文件,我想一次下载相应的文件。我用这个问题的答案 UserState using WebClient and TaskAsync download from Async CTP作为参考。

public void Run()
{
    List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();

    int index = 0;
    Task[] tasks = new Task[urls.Count()];
    foreach (string url in urls)
    {
        WebClient wc = new WebClient();
        string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
        Task downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
        Task outputTask = downloadTask.ContinueWith(t => Output(path));
        tasks[index] = outputTask;
    }
    Console.WriteLine("Start now");
    Task.WhenAll(tasks);
    Console.WriteLine("Done");

}

public void Output(string path)
{
    Console.WriteLine(path);
}

我预计文件的下载将从&#34; Task.WhenAll(tasks)&#34;开始。但事实证明输出看起来像

c:/temp/Output/image-2.jpg
c:/temp/Output/image-1.jpg
c:/temp/Output/image-4.jpg
c:/temp/Output/image-6.jpg
c:/temp/Output/image-3.jpg
[many lines deleted]
Start now
c:/temp/Output/image-18.jpg
c:/temp/Output/image-19.jpg
c:/temp/Output/image-20.jpg
c:/temp/Output/image-21.jpg
c:/temp/Output/image-23.jpg
[many lines deleted]
Done

为什么在调用WaitAll之前开始下载?我可以改变什么来实现我想要的(即所有任务将同时开始)?

由于

3 个答案:

答案 0 :(得分:3)

  

为什么在调用WaitAll之前开始下载?

首先,你没有调用同步阻止的Task.WaitAll,你正在调用Task.WhenAll,它会返回等待的等待值。

现在,正如其他人所说,当你调用异步方法时,即使不使用await,它也会触发异步操作,因为符合TAP的任何方法都将返回“热门任务”。

  

我可以改变什么来达到我想要的(即所有任务都会   同时开始)?

现在,如果您希望将执行推迟到Task.WhenAll,则可以使用Enumerable.Select将每个元素投影到Task,并在将其传递给Task.WhenAll时实现它。 }:

public async Task RunAsync()
{
    IEnumerable<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt");

    var urlTasks = urls.Select((url, index) =>
    {
        WebClient wc = new WebClient();
        string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index);

        var downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
        Output(path);

        return downloadTask;
    });

    Console.WriteLine("Start now");
    await Task.WhenAll(urlTasks);
    Console.WriteLine("Done");
}

答案 1 :(得分:0)

  

为什么在调用WaitAll之前开始下载?

Because

  

由其公共构造函数创建的任务被称为“冷”   任务,因为他们在非预定的生命周期开始   TaskStatus.Created状态,直到在这些上调用Start   他们进展到预定的实例。所有其他任务开始   他们的生命周期处于“热”状态,意味着异步   他们所代表的行动已经启动了   TaskStatus是Created之外的枚举值。 所有任务   从TAP方法返回必须“热”。

由于DownloadFileTaskAsync是TAP方法,因此返回&#34; hot&#34; (即已经开始)任务。

  

我可以改变什么来实现我想要的(即所有任务将同时开始)?

我看TPL Data Flow。这样的事情(我使用HttpClient而不是WebClient,但实际上,它并不重要):

    static async Task DownloadData(IEnumerable<string> urls)
    {
        // we want to execute this in parallel
        var executionOptions = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };

        // this block will receive URL and download content, pointed by URL
        var donwloadBlock = new TransformBlock<string, Tuple<string, string>>(async url =>
        {
            using (var client = new HttpClient())
            {
                var content = await client.GetStringAsync(url);
                return Tuple.Create(url, content);
            }
        }, executionOptions);

        // this block will print number of bytes downloaded
        var outputBlock = new ActionBlock<Tuple<string, string>>(tuple =>
        {
            Console.WriteLine($"Downloaded {(string.IsNullOrEmpty(tuple.Item2) ? 0 : tuple.Item2.Length)} bytes from {tuple.Item1}");
        }, executionOptions);

        // here we tell to donwloadBlock, that it is linked with outputBlock;
        // this means, that when some item from donwloadBlock is being processed, 
        // it must be posted to outputBlock
        using (donwloadBlock.LinkTo(outputBlock))
        {
            // fill downloadBlock with input data
            foreach (var url in urls)
            {
                await donwloadBlock.SendAsync(url);
            }

            // tell donwloadBlock, that it is complete; thus, it should start processing its items
            donwloadBlock.Complete();
            // wait while downloading data
            await donwloadBlock.Completion;
            // tell outputBlock, that it is completed
            outputBlock.Complete();
            // wait while printing output
            await outputBlock.Completion;
        }
    }

    static void Main(string[] args)
    {
        var urls = new[]
        {
            "http://www.microsoft.com",
            "http://www.google.com",
            "http://stackoverflow.com",
            "http://www.amazon.com",
            "http://www.asp.net"
        };

        Console.WriteLine("Start now.");
        DownloadData(urls).Wait();
        Console.WriteLine("Done.");

        Console.ReadLine();
    }

输出:

  

立即开始。
  从http://www.microsoft.com下载了1020个字节   从http://www.google.com下载了53108个字节   从http://stackoverflow.com下载了244143个字节   从http://www.amazon.com下载了468922个字节   从http://www.asp.net下载了27771个字节   完成。

答案 2 :(得分:-1)

  

我可以改变什么来达到我想要的(即所有任务都会   同时开始)?

要同步下载的开头,您可以使用Barrier类。

  public void Run()
  {
      List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();


      Barrier barrier = new Barrier(url.Count, ()=> {Console.WriteLine("Start now");} );

      Task[] tasks = new Task[urls.Count()];

      Parallel.For(0, urls.Count, (int index)=>
      {
           string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
          tasks[index] = DownloadAsync(Uri(urls[index]), path, barrier);        
      })


      Task.WaitAll(tasks); // wait for completion
      Console.WriteLine("Done");
    }

    async Task DownloadAsync(Uri url, string path, Barrier barrier)
    {
           using (WebClient wc = new WebClient())
           {
                barrier.SignalAndWait();
                await wc.DownloadFileAsync(url, path);
                Output(path);
           }
    }