我已经在网上挖了一段时间了,没有找到帮助我解决问题的代码示例..我看过示例代码但是我还没有“得到”它......
我已经阅读了,
http://msdn.microsoft.com/en-us/library/aa480507.aspx和
http://msdn.microsoft.com/en-us/library/dd781401.aspx
但我似乎无法让它发挥作用..
我正在使用HTMLAGILITYPACK
今天我最多可以填写20个网页请求,
请求完成后,结果会被添加到字典中,之后一个方法会搜索该信息,如果找到则代码会退出,如果没有它再进行另一次webrequest,直到它的上限为止。我需要能够在找到所有内容时退出所有线程的异步调用。
就像这样
public void FetchAndParseAllPages()
{
PageFetcher fetcher = new PageFetcher();
for (int i = 0; i < _maxSearchDepth; i += _searchIncrement)
{
string keywordNsearch = _keyword + i;
ParseHtmldocuments(fetcher.GetWebpage(keywordNsearch));
//this checks if the information was found or not, if
//found stop exit and add to database
if (GetPostion() != 201)
{ //ADD DATA TO DATABASE
InsertRankingData(DocParser.GetSearchResults(), _theSearchedKeyword);
return;
}
}
}
这是取回页面的类
public HtmlDocument GetWebpage(string urlToParse)
{
System.Net.ServicePointManager.Expect100Continue = false;
HtmlWeb htmlweb = new HtmlWeb();
htmlweb.PreRequest = new HtmlAgilityPack.HtmlWeb.PreRequestHandler(OnPreRequest);
HtmlDocument htmldoc = htmlweb.Load(@"urlToParse", "38.69.197.71", 45623, "PORXYUSER", "PROXYPASSWORD");
return htmldoc;
}
public bool OnPreRequest(HttpWebRequest request)
{
// request.UserAgent = RandomUseragent();
request.KeepAlive = false;
request.Timeout = 100000;
request.ReadWriteTimeout = 1000000;
request.ProtocolVersion = HttpVersion.Version10;
return true; // ok, go on
}
如何使这个异步并使线程变得非常快?或者我应该在执行异步时使用线程吗?
答案 0 :(得分:0)
好的,我解决了!至少我是这么认为的!执行时间下降到大约七秒钟。没有异步就花了我大约30秒的时间。
这里是我的代码供将来参考。编辑我使用控制台项目来测试代码。我也在使用html agilitypack。这是我的方式,任何有关如何进一步优化这一点的提示都很酷。
public delegate HtmlDocument FetchPageDelegate(string url);
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 10;
FetchPageDelegate del = new FetchPageDelegate(FetchPage);
List<HtmlDocument> htmllist = new List<HtmlDocument>();
List<IAsyncResult> results = new List<IAsyncResult>();
List<WaitHandle> waitHandles = new List<WaitHandle>();
DateTime start = DateTime.Now;
for(int i = 0; i < 200; i += 10)
{
string url = @"URLSTOPARSE YOU CHANGE IT HERE READ FROM LIST OR ANYTHING";
IAsyncResult result = del.BeginInvoke(url, null, null);
results.Add(result);
waitHandles.Add(result.AsyncWaitHandle);
}
WaitHandle.WaitAll(waitHandles.ToArray());
foreach (IAsyncResult async in results)
{
FetchPageDelegate delle = (async as AsyncResult).AsyncDelegate as FetchPageDelegate;
htmllist.Add(delle.EndInvoke(async));
}
Console.ReadLine();
}
static HtmlDocument FetchPage(string url)
{
HtmlWeb htmlweb = new HtmlWeb();
HtmlDocument htmldoc = htmlweb.Load(url);
return htmldoc;
}