我有一个非常好的蜘蛛,但现在我想为该项添加另一个值。问题是我需要传递的值stamp
位于parse
。 stamp
值与我传递给get_details
的链接相关,但stamp
仅在原始页面上。如何修改代码以便在每次生成时添加stamp
值。谢谢
def parse(self, response):
stamp = response.xpath("//div[@class='byline']/time/@datetime")
for url in response.xpath("//h2[@class='post-title']/a/@href").extract():
yield scrapy.Request(url, callback=self.get_details)
def get_details(self, response):
article = ArticleItem()
article['title'] = response.xpath("//h1/text()").extract()
article['url'] = response.url
yield article
答案 0 :(得分:1)
当然,只需通过请求的元属性传递戳记数据,然后将其从get_details
方法中的响应对象中拉出来:
def parse(self, response):
# !! As I don't know the actual page these xpaths are my best guesses and need adjustments
for item in response.xpath("//li[contains(@class, 'river-block')]"):
url = item.xpath(".//h2[@class='post-title']/a/@href").extract()[0]
stamp = item.xpath(".//time/@datetime").extract()
yield scrapy.Request(url, callback=self.get_details, meta={'stamp': stamp})
def get_details(self, response):
article = ArticleItem()
article['title'] = response.xpath("//h1/text()").extract()
article['url'] = response.url
article['stamp'] = response.meta['stamp']
yield article