Question

我正在构建一个抓取工具，而我正在使用aBot来执行此操作。这是一个非常好的系统:) 在开发过程中，我发现了一个问题，与我想要构建爬虫的方式有关，而不是aBot项目本身，但我希望你能帮助我。

设置抓取工具时，我指定爬网完成时要调用的方法，有同步和异步选项。

        crawler.PageCrawlCompleted += crawler_ProcessPageCrawlCompleted;
        crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;

我想使用异步，因为我会在处理旧版时抓取另一个网址。这工作正常，直到我抓取最后一个网址。当我抓取最后一个时，我调用completeAsync方法并且我的爬虫完成了工作，所以它完成并且程序关闭而没有完全处理_ProcessPageCrawlComplete方法，所以我不能保证最后的URL将被处理。

在关闭应用程序之前，有什么方法可以等待最后一个事件结束吗？这是一个设计缺陷吗？

编辑：我忘了提及：我确实可以访问抓取代码。我目前的解决方法是：如果链接是最后一个要处理的链接，请创建一个WaitHandle并等待它完成。听起来有点乱，但是......

Answer 1

ManualResetEvent可以是一个解决方案：

在您的通话方式中：

//Declare the reset event
ManualResetEvent mre = new ManualResetEvent(false);

//Call the async method and subscribe to the event 
crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;

//The application will wait here until the mre is set.
mre.WaitOne();

在您的事件处理程序中：

private void crawler_ProcessPageCrawlCompleted(...)
{
   ....
   mre.Set();
}

另一种方法可以是CountdownEvent。假设您需要抓取10个页面：

CountdownEvent countdown = new CountdownEvent (10);

//Subscribe to the event 
crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;

//Call 10 time the async method
....

//Wait for all events to complete
countdown.Wait();

在处理程序中：

private void crawler_ProcessPageCrawlCompleted(...)
{
    ....
   mre.Signal();
}

如何在关闭应用程序之前等待异步事件完成？

1 个答案: