在C#中使用httpclient保持TCP端口打开

时间:2014-12-29 19:31:28

标签: asynchronous dotnet-httpclient

我是异步编程的新手,我正在尝试使用httpclient来触发页面内容的批量URL请求。 这是我的尝试:

    private async void ProcessUrlAsyncWithHttp(HttpClient httpClient, string purl)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        HttpResponseMessage response = null;
        try
        {
            Interlocked.Increment(ref _activeRequestsCount);
            var request = new HttpRequestMessage()
            {
                RequestUri = new Uri(purl),
                Method = HttpMethod.Get,
            };
            request.Headers.TryAddWithoutValidation("User-Agent", "MozillaMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36");
            request.Headers.TryAddWithoutValidation("Accept", "text/html,*.*");
            request.Headers.TryAddWithoutValidation("Connection", "Keep-Alive");
            request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, sdch");
            request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.8");

            response = await httpClient.SendAsync(request).ConfigureAwait(false);
            string html = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
            response.Dispose();
            if (IsCaptcha(html)) throw new Exception("Captcha was returned");
            request.Dispose();
            Interlocked.Increment(ref _successfulCalls);
        }
        catch (HttpRequestException hex)
        {
            Console.WriteLine("http:" + hex.Message);
            Interlocked.Increment(ref _failedCalls);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.GetType().AssemblyQualifiedName + " " + ex.Message);
            Interlocked.Increment(ref _failedCalls);
        }
        finally
        {
            Interlocked.Decrement(ref _activeRequestsCount);
            Interlocked.Decrement(ref _itemsLeft);
            if (response != null) response.Dispose();
            if (httpClient != null) httpClient.Dispose();
            sw.Stop();
            DateTime currentTime = DateTime.UtcNow;
            TimeSpan elapsedTillNow = (currentTime - _overallStartTime).Duration();
            Console.WriteLine("Left:" + _itemsLeft + ", Current execution:" + sw.ElapsedMilliseconds + " (ms), Average execution:" + Math.Round((elapsedTillNow.TotalMilliseconds / (_totalItems - _itemsLeft)), 0) + " (ms)");

            lock(_syncLock)
            {
                if (_itemsLeft == 0)
                {
                    _overallEndTime = DateTime.UtcNow;
                    this.DisplayTestResults();
                }
            }                
        }

    }

正如您所看到的,我正在将httpclient传递给该函数,并且每次下载URL时都会将其销毁。我知道这是一种矫枉过正,理想情况下我们应该重用httpclient。但是因为我不能使用每个URL使用不同代理的单个httpclient(处理程序需要传递给httpclient的构造函数并且无法更改,因此无法重新创建httpclient对象而无法提供新的代理),我需要使用此方法

在来电方面,我有一个非常基本的代码:

    public async void TestAsyncWithHttp()
    {
        ServicePointManager.DefaultConnectionLimit = 10;
        //ServicePointManager.UseNagleAlgorithm = false; 
        List<string> urlList = SetUpURLList();
        urlList = urlList.GetRange(1, 50);
        _itemsLeft = urlList.Count();
        _totalItems = _itemsLeft;
        List<string> proxies = new List<string>();
        proxies.Add("124.161.94.8:80");
        proxies.Add("183.207.228.8:80");
        proxies.Add("202.29.97.5:3128");
        proxies.Add("210.75.14.158:80");
        proxies.Add("203.100.80.81:8080");
        proxies.Add("218.207.172.236:80");
        proxies.Add("218.59.144.120:81");
        proxies.Add("218.59.144.95:80");
        proxies.Add("218.28.35.234:8080");
        proxies.Add("222.88.236.236:83");
        Random rnd = new Random();
        foreach (string url in urlList)
        {
            int ind = rnd.Next(0, proxies.Count-1);
            var httpClientHandler = new HttpClientHandler
                    {
                        Proxy = new WebProxy(proxies.ElementAt(ind), false),
                        UseProxy = true
                    };
            HttpClient httpClient = new HttpClient(httpClientHandler);
            //HttpClient httpClient = new HttpClient();

            httpClient.Timeout = TimeSpan.FromMinutes(2);
            ProcessUrlAsyncWithHttp(httpClient, url);
        }
    }

问题是: 1)为什么每个请求都关闭TCP端口。我想打开端口的最大连接数,并在调用之间重用它们。例如,在上面的示例中,我可以有10个并发连接。因此,我希望这打开10个TCP端口,然后40个请求的其余部分可以串联使用这10个端口。这是httpwebrequest中预期的正常行为。我有一个使用httpwebrequest的工作代码,描述了重用端口的这种行为。可以根据需要发布可能想要查看的任何人的代码。所以,尽管它基于httpwebrequest,但httpclient并没有模仿这种行为,这有点奇怪。

2)我们如何为此类呼叫分配autoredirect为false? 3)我打算将此功能用于多个呼叫 - 比如大约50K。编写代码的方式有什么不对,可能需要更正 4)让我们假设我以某种方式设法使用单个httpclient对象而不是每个请求一个对象。有什么方法可以确保我读取所有这些单独请求的cookie,并在必要时更改它们,同时记住我对整个URL请求都有一个httpclient类?

韩国社交协会 Kallol

2 个答案:

答案 0 :(得分:1)

根据我的经验(我曾经遇到类似的TCP端口拥塞问题,因为端口总是关闭,当我每分钟点击一个大约6000个连接的服务器时)就足以重用HttpClientHandler对象了,这些对象实际上管理着连接池,并始终为每个请求重新创建HttpClient对象(使用带有HttpClientManager参数的构造函数)。

希望这有帮助。

的Matthias

答案 1 :(得分:-1)

您是否尝试过将HttpClient代码放入类中并创建10个类,每个类都有一个HttpClient?