我正在使用scrapy来抓取搜索结果。
我有变量search_page
,它告诉我们在哪个页码。
我在parse
函数中有变量。
现在我希望如果search_page > 500
然后抓取工具停止抓取
我该怎么做
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="headline_area"]')
items = []
for site in sites[:5]:
item = StackItem()
log.msg(' LOOP' +str(ivar)+ '', level=log.ERROR)
item['title'] ="yoo ma"
request = Request("blabla", callback=self.test1)
request.meta['item'] = item
page_number = nextlink.split("&")[-3].split("=")[-1]
if page_number > 500:
STOP
ivar = ivar + 1
yield request
答案 0 :(得分:4)
https://scrapy.readthedocs.org/en/latest/topics/exceptions.html?highlight=closeSpider
来自scrapy.exceptions的导入CloseSpider
if int(page_number) > 500:
raise CloseSpider('Search Exceeded 500')