Question

我有这条规则：

Rule(SgmlLinkExtractor(allow=('http://.*/category/.*/.*/.*',))),
Rule(SgmlLinkExtractor(allow=('http://.*/product/.*', )),cb_kwargs={'crumbs':response.url},callback='parse_item'),

我想将第一个响应传递给函数（parse_item），但问题是这行代码没有定义错误响应。

如何访问最后一条规则的回复？

Answer 1

您只能在回调中访问Response对象，请尝试以下操作：

Rule(SgmlLinkExtractor(allow=r'http://.*/category/.*/.*/.*'), callback='parse_cat', follow=True),
Rule(SgmlLinkExtractor(allow=r'http://.*/product/.*'), callback='parse_prod'),

def parse_cat(self, response):
    crumbs = response.url
    return self.parse_item(response, crumbs)

def parse_prod(self, response):
    crumbs = response.url
    return self.parse_item(response, crumbs)

def parse_item(self, response, crumbs):
    ...

Answer 2

如果您想访问产品的类别网址（引用网址），请访问parse_item，您可以通过以下方式访问该产品：

response.request.headers.get('Referer')

via：nyov on #scrapy irc

Scrapy在CrawlSpider规则中获取响应URL

2 个答案: