.NET TPL线程性能问题

时间:2018-05-20 11:09:08

标签: .net multithreading performance task-parallel-library semaphore

我正在处理网络报废项目,我正在使用.NET Core 2.0。我的客户提供了大约1Million Domain,客户要求我检查域是否处于活动状态并检查响应是否正常。我的代码如下。当我从文件中读取域时,我没有任何问题,性能非常好。

我的问题是当我使用PLINQ并检查域是否处于活动状态时,它只使用一个线程,而对于25000个域,则需要大约一个小时。当我使用Semaphore方式时,性能很好,有时候,结果中的域计数与Input不匹配。例如,在25000个域中,我得到的结果如24798,我不知道其余202个域在哪里。如何提高性能以及代码中缺少的内容?请帮忙。在这里,我提供PLINQ和信号量版本的代码。

信号量版

        var semaphore = new Semaphore(200, 225);
        var allDomains = new List<BsonDocument>();
        try
        {
            foreach (var domain in domainList)
            {
                var cT1 = Task.Factory.StartNew(() =>
                {
                    try
                    {
                        semaphore.WaitOne();
                        Interlocked.Increment(ref countThreads);
                        var active = IsDomainActive(domain) ? true : false;
                        lock (allDomains) allDomains.Add(
                            new BsonDocument
                            {
                                {"Url", domain},                                    
                                {"Active", active},                                    
                                {"CreatedOn", DateTime.SpecifyKind(DateTime.Now, DateTimeKind.Local)},
                                {"UpdatedOn", DateTime.SpecifyKind(DateTime.Now, DateTimeKind.Local)}
                            }
                        );
                    }
                    finally
                    {
                        semaphore.Release();
                        Interlocked.Decrement(ref countThreads);
                    }
                }, TaskCreationOptions.LongRunning);
            }
        }
        finally
        {
            if (semaphore != null)
            {
                semaphore.Dispose();
                semaphore = null;
            }
        }

PLINQ版本

var allDomains = (
            from domain in domainList.AsParallel().WithCancellation(cancellationToken).WithDegreeOfParallelism(7).WithExecutionMode(ParallelExecutionMode.ForceParallelism)
            where IsDomainActive(domain)
            select new BsonDocument
            {
                {"Url", domain},                   
                {"CreatedOn", DateTime.SpecifyKind(DateTime.Now, DateTimeKind.Local)},
                {"UpdatedOn", DateTime.SpecifyKind(DateTime.Now, DateTimeKind.Local)}
            }).ToList();


 private static bool IsDomainActive(string url)
    {
        var domain = new StringBuilder();
        domain.Append("http://");
        domain.Append(url);
        Console.WriteLine($"IsDomainActive: {url:00} - On Thread " + $"{Thread.CurrentThread.ManagedThreadId:00}. Concurrent: {countThreads}");
        try
        {                               
            var request = (HttpWebRequest) WebRequest.Create(new Uri(domain.ToString()));             
            request.Timeout = 5000;
            request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
            var response = (HttpWebResponse)request.GetResponse();
            return (response == null || response.StatusCode != HttpStatusCode.OK) ? false : true;
        }
        catch (Exception e)
        {
            return false;
        }
    }

我的电脑配置如下,问题出现在两个环境中。

  1. RAM:64GB
  2. 处理器:8Core
  3. Windows 10
  4. 我的Linux服务器配置在

    之下
    1. RAM:4GB
    2. HDD:80GB SSD
    3. 操作系统:Ubuntu 16.04
    4. 处理器:4Core
    5. @ EDIT1

      我已经使用HttpClient.GetAsync更新了代码,但性能仍然很慢甚至1000个域需要花费大量时间。

      private static async Task<bool> IsDomainActive(string url)
          {
              var domain = new StringBuilder();
              domain.Append("http://");
              domain.Append(url);
              Console.WriteLine("Processing Domain: " + url);
              try
              {
                  var sessionId = (new Random()).Next().ToString();
                  var netProxy = new WebProxy("<proxyserver>", port);
                  login = "<login>";
                  password = "<password>";
                  netProxy.Credentials = new NetworkCredential(login, password);
                  var handler = new HttpClientHandler()
                  {
                      Proxy = netProxy,
                      UseProxy = true,
                  };
                  var httpClient = new HttpClient(handler);
                  var request = new HttpRequestMessage() {
                      RequestUri = new Uri(domain.ToString()),
                      Method = HttpMethod.Get
                  };
                  request.Headers.Add("Timeout","5000");
                  request.Headers.Add("UserAgent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2");
                  var response = await httpClient.SendAsync(request).ConfigureAwait(false);
                  response.EnsureSuccessStatusCode();
                  return true;
              }
              catch (Exception e)
              {
                  return false;
              }
          }
      

      @ EDIT2 更改了列表&lt;&gt;并发。

      private static List<BsonDocument> ProcessFile(ConcurrentBag<string> domains, IProgress<string> progress,
              CancellationToken cancellationToken)
          {
      
              var allDomains = (from domain in domains.AsParallel().WithCancellation(cancellationToken)
                      .WithDegreeOfParallelism(Environment.ProcessorCount)
                      .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
                  where IsDomainActive(domain).Result
                  select new BsonDocument
                  {
                              {"Url", domain},                           
                              {"Protocol", "http"},
                              {"CreatedOn", DateTime.SpecifyKind(DateTime.Now, DateTimeKind.Local)},
                              {"UpdatedOn", DateTime.SpecifyKind(DateTime.Now, DateTimeKind.Local)}
                  }).ToList();
              return allDomains;
          }
      

1 个答案:

答案 0 :(得分:0)

我已经弄清楚了这个问题,并在linux中测试了相同的功能并且工作正常。谢谢大家。而不是信号量方法,我使用Parallel.Foreach()并检查线程创建并使用输入验证结果。一切看起来都很适合明天的演示。