我正在运行代码,以从县级网站下载大量文档,通常是税务报表。开始时,我正在运行的代码似乎快速高效,并且在文件数达到200左右之前效果很好。这是性能开始下降的时候。如果我让它继续运行,它仍然可以工作,但是到了缓慢的地步。我通常必须停止它,找出尚未下载的文件,然后重新开始。
对于使此过程更快,更高效,更流畅(无论文件数如何)的任何帮助,将不胜感激。
我一直坚信性能问题与立即将结果写入html文件有关。我尝试将结果存储在StringBuilder中,直到下载完成,但是当然我的内存不足。
我还尝试过调整MaxDegreeOfParallelism,将其降低到5似乎影响不大,但是与文件数有关的性能问题仍然存在。
private void Run_Mass_TaxBillDownload()
{
string strTag = null;
string county = countyName.SelectedItem.ToString() + "-";
//Converting urlList to uriList...
List<Uri> uriList = new List<Uri>();
foreach (string url in TextViewer.Lines)//"TextViewer is a textbox where urls to be downloaded are stored...
{
if (url.Length > 5){Uri myUri = new Uri(url.Trim(), UriKind.RelativeOrAbsolute);uriList.Add(myUri);}
}
Parallel.ForEach(uriList, new ParallelOptions { MaxDegreeOfParallelism = 5 }, str =>
{
using (WebClient client = new WebClient())
{
//Extracting taxbill numbers from the url to use as file names in the saved file...
string FirstString = null;
string LastString = null;
if (str.ToString().ToLower().Contains("&tptick")) { FirstString = "&TPTICK="; LastString = "&TPSX="; }
if (str.ToString().ToLower().Contains("&ticket=")) { FirstString = "&ticket="; LastString = "&ticketsuff="; }
if (str.ToString().ToLower().Contains("demandbilling")) { FirstString = "&ticketNumber="; LastString = "&ticketSuffix="; }
//Start downloading...
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(clientTaxBill_DownloadStringCompleted);
client.DownloadStringAsync(str, county + (Between(str.ToString(), FirstString, LastString)));
}
});
}
private static void clientTaxBill_DownloadStringCompleted(Object sender, DownloadStringCompletedEventArgs e)
{
//Creating Output file....
string deskTopPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string outputPath = deskTopPath + "\\Downloaded Tax Bills";
string errOutputFile = outputPath + "\\errorReport.txt";
string results = null;
string taxBillNum = e.UserState as string;
try
{
File.WriteAllText(outputPath + "\\" + taxBillNum + ".html", e.Result.ToString());
}
catch
{
results = Environment.NewLine + "<<{ERROR}>> NOTHING FOUND FOR" + taxBillNum;
File.AppendAllText(errOutputFile, results);
}
}
答案 0 :(得分:1)
如果DownloadStringAsync
正在进行,那么它将一次运行5次以上的下载,DownloadStringCompleted
将建立回叫,然后继续并再次循环。
因此,它不会等待每个完成。
ActionBlock
是您的朋友,因为它与async
代码一起使用效果更好,并且与httpClient
(而不是WebClient
)相结合
尝试这样的事情
public static async Task Downloader()
{
var urls = new string[] { "https://www.google.co.uk/", "https://www.microsoft.com/" };
var ab = new ActionBlock<string>(async (url) =>
{
var httpClient = new HttpClient();
var httpResponse = await httpClient.GetAsync(url);
var text = await httpResponse.Content.ReadAsStringAsync();
// just write it to a file
Console.WriteLine(text);
}, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = 5 });
foreach(var url in urls)
{
await ab.SendAsync(url);
}
ab.Complete();
await ab.Completion;
Console.WriteLine("Done");
Console.ReadKey();
}
MaxDegreeOfParallelism = 5
表示5个线程,wait ab.SendAsync(url);
很重要,就像您想用BoundedCapacity = n
限制缓冲区大小一样,这将等待直到有空间,而ab.Post()
方法不会,如果没有空间,它将仅返回false