无法弄清楚为什么我的close方法没有被执行。我必须处理两个网址列表。必须首先处理一个列表并导出,然后处理第二个列表。
问题是只调用close方法(断点在def
处停止)但未执行。你知道为什么吗?
# coding=utf-8
from bot.items import TestItem
from scrapy import Spider, Request, signals
from scrapy.exceptions import DontCloseSpider
from scrapy.xlib.pydispatch import dispatcher
class IndexSpider(Spider):
name = 'index_spider'
allowed_domains = ['www.doman.org']
def start_requests(self):
for url in ["https://www.doman.org/eshop"]:
yield Request(url, callback=self.parse_main_page)
def parse_main_page(self, response):
self.categories = [some tuples]
self.subcategories = [some tuples]
def close(self, spider): # Execution ends here
pass # This is not being executed
if self.categories:
for cat in self.categories:
url = "https://www.doman.org/search/getAjaxResult?categoryId={}".format(cat[0])
yield Request(url, meta={'tup': cat, 'priority': 0}, priority=0, callback=self.parse_category)
self.categories = []
raise DontCloseSpider
答案 0 :(得分:0)
close
方法是一种静态方法:https://github.com/scrapy/scrapy/blob/master/scrapy/spiders/init.py#L101因此您的close
方法签名不匹配。
答案 1 :(得分:0)
我认为你需要像这样注册这个功能
class IndexSpider(Spider):
def __init__(self, *args, **kwargs):
dispatcher.connect(self.spider_closed, signals.spider_closed)
super(IndexSpider, self).__init__(*args, **kwargs)
def spider_closed(self, spider):
pass # This is not being executed
if self.categories:
for cat in self.categories:
url = "https://www.doman.org/search/getAjaxResult?categoryId={}".format(cat[0])
yield Request(url, meta={'tup': cat, 'priority': 0}, priority=0, callback=self.parse_category)
self.categories = []
raise DontCloseSpider
此外,我不确定您是否可以在spider_closed
函数内再发送请求,因为Spider已经在那里关闭了。
在您的情况下,我建议您从spider_closed
方法中删除所有代码,然后只写下这样的打印消息
def spider_closed(self, spider):
logging.info("spider_closed() called")
所以这样你知道调用了spider_closed
,然后尝试在该方法中发送Request表单。