Scrapy传递自定义值

时间:2017-05-16 18:58:06

标签: python scrapy

我有一个非常好的蜘蛛,但现在我想为该项添加另一个值。问题是我需要传递的值stamp位于parsestamp值与我传递给get_details的链接相关,但stamp仅在原始页面上。如何修改代码以便在每次生成时添加stamp值。谢谢

def parse(self, response):
    stamp = response.xpath("//div[@class='byline']/time/@datetime")

    for url in response.xpath("//h2[@class='post-title']/a/@href").extract():
        yield scrapy.Request(url, callback=self.get_details)

def get_details(self, response):
        article = ArticleItem()
        article['title'] = response.xpath("//h1/text()").extract()
        article['url'] = response.url
        yield article

1 个答案:

答案 0 :(得分:1)

当然,只需通过请求的元属性传递戳记数据,然后将其从get_details方法中的响应对象中拉出来:

def parse(self, response):

    # !! As I don't know the actual page these xpaths are my best guesses and need adjustments
    for item in response.xpath("//li[contains(@class, 'river-block')]"):
        url = item.xpath(".//h2[@class='post-title']/a/@href").extract()[0]
        stamp = item.xpath(".//time/@datetime").extract()
        yield scrapy.Request(url, callback=self.get_details, meta={'stamp': stamp})

def get_details(self, response):
    article = ArticleItem()
    article['title'] = response.xpath("//h1/text()").extract()
    article['url'] = response.url
    article['stamp'] = response.meta['stamp']
    yield article