我使用Scrapy和Selenium进行刮擦。
当我使用htop
运行我的蜘蛛时,我会看到我的webdriver实例。
我想知道何时应该在代码中关闭webdriver?
def parse():
# I have all links in my array_links
for link in self.array_links:
self.driver.get(link)
# Here i Parse the products
item = MyTestItem()
item['test1'] = "test"
yield item
我在我的代码中添加了这个
def __del__(self):
self.driver.quit()
对于脚本的结尾,但我不知道在获取每个链接后我应该close
webdriver吗?
谢谢,
答案 0 :(得分:1)
问题是。你在哪里打开它?
如果您的webdriver位于蜘蛛的上下文中,那么理想情况下,您需要在蜘蛛打开时打开它,并在蜘蛛关闭时将其关闭。
您可以通过connectin open_spider
和close_spider
信号来执行此操作:
from scrapy import signals
from scrapy import Spider
class MySpider(Spider):
name = "spideroo"
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super().from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
return spider
def spider_opened(self, spider):
self.driver = selenium.WebDriver() # or what's your driver's class is.
def spider_closed(self, spider):
self.driver.close()