Question

以下是简单蜘蛛的代码：

# -*- coding: utf-8 -*-

import os
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.http import Request

class AzS(CrawlSpider):

    name = 'azs'
    allowed_domains = ["******"]
    start_urls = [
        "*********" 
    ]

    rules = (
        Rule(LinkExtractor(restrict_xpaths = ("""//*[@id="yearList"]""")), callback = 'year', follow = True), # years at start url
        Rule(LinkExtractor(restrict_xpaths = ("""/html/body/div[3]/div/div[1]/div/div[2]/ul""")), callback = 'model', follow = True), # model
    )


    # start url with years list
    def year(self, response):
        yr = response.url.split('=')[-1][2:4]
        request = Request(response.url,
                          callback = self.model)
        request.meta['yr'] = yr
        return request


    # the page of the year with models list 
    def model(self, response):
        print response.meta['yr']

并且在执行时，此代码会产生此错误：

File "xxxxxxxxxx.py", line 33, in model
            print response.meta['yr']
        exceptions.KeyError: 'yr'

我无法弄清楚导致此错误的原因，因此感谢任何帮助。提前谢谢。

Answer 1

由于您有两条规则，当第二条规则变为真时，很明显会触发一些请求，然后从model处获得响应，在这种情况下，您不会设置任何meta数据键'yr'。这可能是您错误的根本原因。

您可以使用某些try - except或尝试使用get()访问密钥，即response.meta.get('yr', 'your_value')。如果找不到key，则会将your_value作为值。

尝试访问Scrapy中的request.meta键时的KeyError

1 个答案: