我是异步编程的新手,我正在尝试使用httpclient来触发页面内容的批量URL请求。 这是我的尝试:
private async void ProcessUrlAsyncWithHttp(HttpClient httpClient, string purl)
{
Stopwatch sw = new Stopwatch();
sw.Start();
HttpResponseMessage response = null;
try
{
Interlocked.Increment(ref _activeRequestsCount);
var request = new HttpRequestMessage()
{
RequestUri = new Uri(purl),
Method = HttpMethod.Get,
};
request.Headers.TryAddWithoutValidation("User-Agent", "MozillaMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept", "text/html,*.*");
request.Headers.TryAddWithoutValidation("Connection", "Keep-Alive");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, sdch");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.8");
response = await httpClient.SendAsync(request).ConfigureAwait(false);
string html = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
response.Dispose();
if (IsCaptcha(html)) throw new Exception("Captcha was returned");
request.Dispose();
Interlocked.Increment(ref _successfulCalls);
}
catch (HttpRequestException hex)
{
Console.WriteLine("http:" + hex.Message);
Interlocked.Increment(ref _failedCalls);
}
catch (Exception ex)
{
Console.WriteLine(ex.GetType().AssemblyQualifiedName + " " + ex.Message);
Interlocked.Increment(ref _failedCalls);
}
finally
{
Interlocked.Decrement(ref _activeRequestsCount);
Interlocked.Decrement(ref _itemsLeft);
if (response != null) response.Dispose();
if (httpClient != null) httpClient.Dispose();
sw.Stop();
DateTime currentTime = DateTime.UtcNow;
TimeSpan elapsedTillNow = (currentTime - _overallStartTime).Duration();
Console.WriteLine("Left:" + _itemsLeft + ", Current execution:" + sw.ElapsedMilliseconds + " (ms), Average execution:" + Math.Round((elapsedTillNow.TotalMilliseconds / (_totalItems - _itemsLeft)), 0) + " (ms)");
lock(_syncLock)
{
if (_itemsLeft == 0)
{
_overallEndTime = DateTime.UtcNow;
this.DisplayTestResults();
}
}
}
}
正如您所看到的,我正在将httpclient传递给该函数,并且每次下载URL时都会将其销毁。我知道这是一种矫枉过正,理想情况下我们应该重用httpclient。但是因为我不能使用每个URL使用不同代理的单个httpclient(处理程序需要传递给httpclient的构造函数并且无法更改,因此无法重新创建httpclient对象而无法提供新的代理),我需要使用此方法
在来电方面,我有一个非常基本的代码:
public async void TestAsyncWithHttp()
{
ServicePointManager.DefaultConnectionLimit = 10;
//ServicePointManager.UseNagleAlgorithm = false;
List<string> urlList = SetUpURLList();
urlList = urlList.GetRange(1, 50);
_itemsLeft = urlList.Count();
_totalItems = _itemsLeft;
List<string> proxies = new List<string>();
proxies.Add("124.161.94.8:80");
proxies.Add("183.207.228.8:80");
proxies.Add("202.29.97.5:3128");
proxies.Add("210.75.14.158:80");
proxies.Add("203.100.80.81:8080");
proxies.Add("218.207.172.236:80");
proxies.Add("218.59.144.120:81");
proxies.Add("218.59.144.95:80");
proxies.Add("218.28.35.234:8080");
proxies.Add("222.88.236.236:83");
Random rnd = new Random();
foreach (string url in urlList)
{
int ind = rnd.Next(0, proxies.Count-1);
var httpClientHandler = new HttpClientHandler
{
Proxy = new WebProxy(proxies.ElementAt(ind), false),
UseProxy = true
};
HttpClient httpClient = new HttpClient(httpClientHandler);
//HttpClient httpClient = new HttpClient();
httpClient.Timeout = TimeSpan.FromMinutes(2);
ProcessUrlAsyncWithHttp(httpClient, url);
}
}
问题是: 1)为什么每个请求都关闭TCP端口。我想打开端口的最大连接数,并在调用之间重用它们。例如,在上面的示例中,我可以有10个并发连接。因此,我希望这打开10个TCP端口,然后40个请求的其余部分可以串联使用这10个端口。这是httpwebrequest中预期的正常行为。我有一个使用httpwebrequest的工作代码,描述了重用端口的这种行为。可以根据需要发布可能想要查看的任何人的代码。所以,尽管它基于httpwebrequest,但httpclient并没有模仿这种行为,这有点奇怪。
2)我们如何为此类呼叫分配autoredirect为false? 3)我打算将此功能用于多个呼叫 - 比如大约50K。编写代码的方式有什么不对,可能需要更正 4)让我们假设我以某种方式设法使用单个httpclient对象而不是每个请求一个对象。有什么方法可以确保我读取所有这些单独请求的cookie,并在必要时更改它们,同时记住我对整个URL请求都有一个httpclient类?
韩国社交协会 Kallol
答案 0 :(得分:1)
根据我的经验(我曾经遇到类似的TCP端口拥塞问题,因为端口总是关闭,当我每分钟点击一个大约6000个连接的服务器时)就足以重用HttpClientHandler对象了,这些对象实际上管理着连接池,并始终为每个请求重新创建HttpClient对象(使用带有HttpClientManager参数的构造函数)。
希望这有帮助。
的Matthias
答案 1 :(得分:-1)
您是否尝试过将HttpClient代码放入类中并创建10个类,每个类都有一个HttpClient?