Question

我是scrapy的新手。任何人都可以告诉我，如何将数据从初始请求传递到后续请求？我的代码出了什么问题？

class SizeCrawler（CrawlSpider）：

name = "size-uk-crawl"
allowed_domians = ["size.co.uk"]
start_urls = ["http://www.size.co.uk"]

# Set the rules for scraping all the available products of a website
rules = (
    Rule(
        SgmlLinkExtractor(restrict_xpaths=(
            "(//*[@id='primaryNavigation']/li/span/a)[position() >= 3]",  # get all cloths, footwear and accessories
            "//*[@id='categoryMenu']//li/a")),                            # get all categories
        follow=True, process_request='add_gender'
    ),
    Rule(
        SgmlLinkExtractor(restrict_xpaths=(
            "//div[@class='product-list gallery-view medium-images']/ol//h2/a")),
        callback='parse_product'
    ),

)
def add_gender(self, request):

    # Select the value for gender here
    logging.info(request.meta)
    gender = request.meta.get('link_text')
    if gender == 'ForWomen':
        gender = 'women'
    else:
        gender = 'men'
    request.meta['gender'] = gender

    return request

def parse_product(self, response):

    # Problem here
    # I am not getting gender information here
    logging.info(response.meta)
    logging.info(response.request.meta)

Answer 1

您在一个规则中添加了callback = parse_product，在另一个规则中添加了处理请求。你确定这是你想做的吗？导致parse_product的请求将不会通过添加性别来处理。

将数据从初始请求传递到最后一个请求（Scrapy）

1 个答案: