我有一个场景,我正在浏览一个商店,浏览10页。然后当我找到我想要的物品时,我会把它添加到篮子里。
最后我想结帐。问题是,通过scrapy链接,它想要多次检查篮子,就像我在篮子里的物品一样。
如何将链接的请求合并为一个,所以在将10个项目添加到购物篮后,结帐仅被调用一次?
def start_requests(self):
params = getShopList()
for param in params:
yield scrapy.FormRequest('https://foo.bar/shop', callback=self.addToBasket,
method='POST', formdata=param)
def addToBasket(self, response):
yield scrapy.FormRequest('https://foo.bar/addToBasket', callback=self.checkoutBasket,
method='POST',
formdata=param)
def checkoutBasket(self, response):
yield scrapy.FormRequest('https://foo.bar/checkout', callback=self.final, method='POST',
formdata=param)
def final(self):
print("Success, you have purchased 59 items")
编辑:
我尝试在已关闭的事件中发出请求,但它没有遇到请求也没有回调..
def closed(self, reason):
if reason == "finished":
print("spider finished")
return scrapy.Request('https://www.google.com', callback=self.finalmethod)
print("Spider closed but not finished.")
def finalmethod(self, response):
print("finalized")
答案 0 :(得分:0)
我认为你可以在蜘蛛完成时手动结帐:
def closed(self, reason):
if reason == "finished":
return requests.post(checkout_url, data=param)
print("Spider closed but not finished.")
请参阅closed。
class MySpider(scrapy.Spider):
name = 'whatever'
def start_requests(self):
params = getShopList()
for param in params:
yield scrapy.FormRequest('https://foo.bar/shop', callback=self.addToBasket,
method='POST', formdata=param)
def addToBasket(self, response):
yield scrapy.FormRequest('https://foo.bar/addToBasket',
method='POST', formdata=param)
def closed(self, reason):
if reason == "finished":
return requests.post(checkout_url, data=param)
print("Spider closed but not finished.")
答案 1 :(得分:0)
我通过使用Scrapy信号和spider_idle
调用来解决它。
当蜘蛛闲置时发送,这意味着蜘蛛没有 进一步:
https://doc.scrapy.org/en/latest/topics/signals.html
from scrapy import signals, Spider
class MySpider(scrapy.Spider):
name = 'whatever'
def start_requests(self):
self.crawler.signals.connect(self.spider_idle, signals.spider_idle) ## notice this
params = getShopList()
for param in params:
yield scrapy.FormRequest('https://foo.bar/shop', callback=self.addToBasket,
method='POST', formdata=param)
def addToBasket(self, response):
yield scrapy.FormRequest('https://foo.bar/addToBasket',
method='POST', formdata=param)
def spider_idle(self, spider): ## when all requests are finished, this is called
req = scrapy.Request('https://foo.bar/checkout', callback=self.checkoutFinished)
self.crawler.engine.crawl(req, spider)
def checkoutFinished(self, response):
print("Checkout finished")