以下是简单蜘蛛的代码:
# -*- coding: utf-8 -*-
import os
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.http import Request
class AzS(CrawlSpider):
name = 'azs'
allowed_domains = ["******"]
start_urls = [
"*********"
]
rules = (
Rule(LinkExtractor(restrict_xpaths = ("""//*[@id="yearList"]""")), callback = 'year', follow = True), # years at start url
Rule(LinkExtractor(restrict_xpaths = ("""/html/body/div[3]/div/div[1]/div/div[2]/ul""")), callback = 'model', follow = True), # model
)
# start url with years list
def year(self, response):
yr = response.url.split('=')[-1][2:4]
request = Request(response.url,
callback = self.model)
request.meta['yr'] = yr
return request
# the page of the year with models list
def model(self, response):
print response.meta['yr']
并且在执行时,此代码会产生此错误:
File "xxxxxxxxxx.py", line 33, in model
print response.meta['yr']
exceptions.KeyError: 'yr'
我无法弄清楚导致此错误的原因,因此感谢任何帮助。提前谢谢。
答案 0 :(得分:1)
由于您有两条规则,当第二条规则变为真时,很明显会触发一些请求,然后从model
处获得响应,在这种情况下,您不会设置任何meta
数据键'yr'
。这可能是您错误的根本原因。
您可以使用某些try - except
或尝试使用get()
访问密钥,即response.meta.get('yr', 'your_value')
。如果找不到key
,则会将your_value
作为值。