我首先尝试获取该书的链接,然后进入该链接并获取该书的标题。最后,我想在一个列中存储标题,并在csv文件的另一列中链接。这就是我写这本书的方式。我只获得链接而不是标题。
import scrapy
class AmazonSpiderSpider(scrapy.Spider):
name = 'amazon_spider'
allowed_domains = ['www.amazon.com']
start_urls = ['https://www.amazon.com/s/ref=dp_bc_3?ie=UTF8&node=468216&rh=n%3A283155%2Cn%3A%212349030011%2Cn%3A465600%2C']
def parse(self, response):
links = response.xpath('//*[@class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal"]/@href').extract()
for link in links:
yield {'Book Urls': link}
yield scrapy.Request(link, callback=self.book_title)
def book_title(self, response):
title = response.xpath('//*[@id="productTitle"]/text()').extract_first()
yield {'Title': title}
答案 0 :(得分:0)
我用response.meta
解决了这个问题。
import scrapy
class AmazonSpiderSpider(scrapy.Spider):
name = 'amazon_spider'
allowed_domains = ['www.amazon.com']
start_urls = ['https://www.amazon.com/s/ref=dp_bc_3?ie=UTF8&node=468216&rh=n%3A283155%2Cn%3A%212349030011%2Cn%3A465600%2C']
def parse(self, response):
links = response.xpath('//*[@class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal"]/@href').extract()
for link in links:
title = response.meta.get('title')
yield scrapy.Request(link, callback=self.book_title, meta = {'title':title, 'Link': link})
def book_title(self, response):
title = response.xpath('//*[@id="productTitle"]/text()').extract()
response.meta['title'] = title
yield response.meta