我有一个大型数据库,其中包含需要处理的站点URL列表。面临需要很长时间的问题。 我已经很长时间没有用C#编写任何东西了,我已经忘记了。 告诉我,是否真的可以让我快速浏览文件,收集并发送每一行进行处理。 我会解释。文件-5m个链接。运行时-他快速读取所有行,并异步发出500万个httpwebrequest请求。您只需要等待。 如果没有,那么如何加速呢?
启动功能:-cp ../lib/*
startSearch();
答案 0 :(得分:1)
根据HttpWebRequest
文档:
我们不建议您将
HttpWebRequest
用于新开发。而是使用System.Net.Http.HttpClient
类。
HttpClient
也很流畅,并且默认情况下可以重用连接,并且async
的API都是现成的。
这是一个HttpClient
和一些其他调整的示例。唯一更改的方法:
// HttpClient is intended to be instantiated once per application, rather than per-use.
private static readonly HttpClient httpClient = new HttpClient();
private void Form1_Load(object sender, EventArgs e)
{
// ...existing code...
ServicePointManager.DefaultConnectionLimit = 10; // this line is not needed in .NET Core
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36");
httpClient.DefaultRequestHeaders.Connection.ParseAdd("keep-alive");
httpClient.Timeout = TimeSpan.FromSeconds(4);
}
// rewritten method
public async Task<string[]> cURL(string url, SemaphoreSlim semaphore)
{
try
{
if (!url.StartsWith("http")) url = "http://" + url;
using (HttpResponseMessage response = await httpClient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false))
{
if (response.IsSuccessStatusCode)
{
string statusGet = ((int)response.StatusCode).ToString();
string respURL = response.RequestMessage.RequestUri.ToString();
string requestResult = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
HashSet<string> emails = new HashSet<string>();
Regex ItemRegex = new Regex(@"[a-z0-9_\-\+]+@[a-z0-9\-]+\.((?!.*png|.*jpg)[a-z]{2,10})(?:\.[a-z]{2})?", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(requestResult))
{
emails.Add(ItemMatch.ToString());
}
string getCMS = GetCMS(requestResult);
string emailList = string.Join(";", emails);
return new[] { url, statusGet, respURL, getCMS, emailList };
}
}
return null;
}
catch
{
return null;
}
finally
{
semaphore.Release();
}
}
此代码必须比现有代码更快。
要提高性能,需要为startSearch()
添加一些并发:
using (StreamReader sr = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
using (SemaphoreSlim semaphore = new SemaphoreSlim(Environment.ProcessorCount * 2))
{
List<Task<string[]>> tasks = new List<Task<string[]>>();
while (!sr.EndOfStream)
{
await semaphore.WaitAsync();
tasks.Add(cURL(sr.ReadLine().Trim(), semaphore));
if (tasks.Count == 1000 || sr.EndOfStream) // flush results
{
string[][] results = await Task.WhenAll(tasks);
using (StreamWriter sw = new StreamWriter(outPath + "\\" + outName + ".csv", true))
{
foreach (string[] r in results)
{
if (r != null) sw.WriteLine(string.Join(", ", r));
}
}
tasks.Clear();
}
textBox4.Text = count.ToString();
count++;
}
}