Parallel.For和httpclient崩溃了应用程序C#

时间:2017-08-10 04:54:51

标签: c# parallel-processing dotnet-httpclient

我想避免由于并行for循环和httpclient引起的应用程序崩溃问题,但由于我对编程知识有限,我无法应用Web上其他地方提供的解决方案。我的代码粘贴在下面。

class Program
    {
        public static List<string> words = new List<string>();
        public static int count = 0;
        public static string output = "";
        private static HttpClient Client = new HttpClient();
        public static void Main(string[] args)
        {
            //input path strings...
            List<string> links = new List<string>();
            links.AddRange(File.ReadAllLines(input));
            List<string> longList = new List<string>(File.ReadAllLines(@"a.txt"));
            words.AddRange(File.ReadAllLines(output1));
            System.Net.ServicePointManager.DefaultConnectionLimit = 8;
            count = longList.Count;
            //for (int i = 0; i < longList.Count; i++)
            Task.Run(() => Parallel.For(0, longList.Count, new ParallelOptions { MaxDegreeOfParallelism = 5 }, (i, loopState) =>
            {
                Console.WriteLine(i);
                string link = @"some link" + longList[i] + "/";
                try
                {
                    if (!links.Contains(link))
                    {
                        Task.Run(async () => { await Download(link); }).Wait();
                    }
                }
                catch (System.Exception e)
                {

                }
                               }));
            //}

        }
        public static async Task Download(string link)
        {
            HtmlAgilityPack.HtmlDocument document = new HtmlDocument();
            document.LoadHtml(await getURL(link));
            //...stuff with html agility pack
        }
        public static async Task<string> getURL(string link)
        {
            string result = "";
            HttpResponseMessage response = await Client.GetAsync(link);
            Console.WriteLine(response.StatusCode);
            if(response.IsSuccessStatusCode)
            {
                HttpContent content = response.Content;
                var bytes = await response.Content.ReadAsByteArrayAsync();
                result = Encoding.UTF8.GetString(bytes);
            }
            return result;
        }

    }

有一些解决方案,例如this one,但我不知道如何将await关键字放在我的main方法中,目前程序只是因为它在{{1}之前不存在而退出}}。正如您所看到的,我已经应用了一个关于Task.Run()方法的解决方法来在main方法中调用它。 我也怀疑在不同的并行线程中使用相同的httpclient实例。请告诉我是否应该每次都创建一个新的httpclient实例。

1 个答案:

答案 0 :(得分:0)

You're right that you have to block tasks somewhere in a console application, otherwise the program will just exit before it's complete. But you're doing this more than you need to. Aim for just blocking the main thread and delegating the rest to an async method. A good practice is to create a method with a signature like private async Task MainAsyc(args), put the "guts" of your program logic there, call it from Main like this:

MainAsync(args).Wait();

In your example, move everything from Main to MainAsync. Then you're free to use await as much as you want. Task.Run and Parallel.For are explicitly consuming new threads for I/O bound work, which is unnecessary in the async world. Use Task.WhenAll instead. The last part of your MainAsync method should end up looking something like this:

await Task.WhenAll(longList.Select(async s => {
    Console.WriteLine(i);
    string link = @"some link" + s + "/";
    try
    {
        if (!links.Contains(link))
        {
            await Download(link);
        }
    }
    catch (System.Exception e)
    {

    }
}));

There is one little wrinkle here though. Your example is throttling the parallelism at 5. If you find you still need this, TPL Dataflow is a great library for throttled parallelism in the async world. Here's a simple example.

Regarding HttpClient, using a single instance across threads is completely safe and highly encouraged.