我是scrapy的新手。任何人都可以告诉我,如何将数据从初始请求传递到后续请求?我的代码出了什么问题?
class SizeCrawler(CrawlSpider):
name = "size-uk-crawl"
allowed_domians = ["size.co.uk"]
start_urls = ["http://www.size.co.uk"]
# Set the rules for scraping all the available products of a website
rules = (
Rule(
SgmlLinkExtractor(restrict_xpaths=(
"(//*[@id='primaryNavigation']/li/span/a)[position() >= 3]", # get all cloths, footwear and accessories
"//*[@id='categoryMenu']//li/a")), # get all categories
follow=True, process_request='add_gender'
),
Rule(
SgmlLinkExtractor(restrict_xpaths=(
"//div[@class='product-list gallery-view medium-images']/ol//h2/a")),
callback='parse_product'
),
)
def add_gender(self, request):
# Select the value for gender here
logging.info(request.meta)
gender = request.meta.get('link_text')
if gender == 'ForWomen':
gender = 'women'
else:
gender = 'men'
request.meta['gender'] = gender
return request
def parse_product(self, response):
# Problem here
# I am not getting gender information here
logging.info(response.meta)
logging.info(response.request.meta)
答案 0 :(得分:0)
您在一个规则中添加了callback = parse_product,在另一个规则中添加了处理请求。你确定这是你想做的吗?导致parse_product的请求将不会通过添加性别来处理。