我不确定scrapy
是如何运作的。我做了一个几乎完美的爬行器。我有一份dict列表。 (config.products)这些dict包含一个必须在函数initial_search
中发送的POST。因此initial_search
必须多次调用,但是现在initial_search
发送的POST只进行一次,爬虫正在关闭。我添加了dont_filter = True
,但这没有任何改变。有谁知道出了什么问题?
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
meta={'product':config.products[0]},
callback=self.initial_search
)
def initial_search(self, response):
config.actualProduct = response.meta['product']
if config.products.index(config.actualProduct) == 0:
config.savedResponse = response
# The second time, the request is not made. (even with dont_filter=True)
return scrapy.FormRequest(
url=response.url,
formdata=dictArgs,
meta={'dictArgs': config.actualProduct},
dont_filter = True,
callback=self.other_function
)
def other_function(self, response):
return scrapy.FormRequest(
url=response.url,
formdata=dictArgs,
meta={'dictArgs': config.actualProduct},
callback=self.other_function2
)
def other_function2(self, response):
nextPosition = config.products.index(config.actualProduct) + 1
# Checking if we have another dict to post
if nextPosition < len(config.products):
config.savedResponse.meta['product'] = config.products[nextPosition]
self.initial_search(config.savedResponse)
任何帮助将不胜感激
答案 0 :(得分:0)
事实上,您没有正确地在initial_search
中呼叫other_function2
。这是它应该是这样的:
def other_function2(self, response):
nextPosition = config.products.index(config.actualProduct) + 1
# Checking if we have another dict to post
if nextPosition < len(config.products):
config.savedResponse.meta['product'] = config.products[nextPosition]
yield scrapy.Request(
config.savedResponse,
meta={'product':config.products[nextPosition]},
callback=self.initial_search
)