如何从同一行的多个页面中抓取数据?
main page > next page (scrape title) > sub page (scrape img)
在我的例子中:
all product > product page n°1 (scrape title) > sub page product n°1(scrape img)
> product page n°2 (scrape title) > sub page product n°2 (scrape img)
结果我的json(坏):
期望的结果:
结果很好但不是结构。 如何从同一行的多个页面中抓取数据?
class QuotesSpider(scrapy.Spider):
name = 'quotesbij'
allowed_domains = ['test.com']
start_urls = ['http://test.com']
#page 1 : all url product
def parse(self, response):
urls = response.css('div.item > div.info > h3 > a::attr(href)').extract()
for url in urls:
url = response.urljoin(url)
yield scrapy.Request(url=url, callback=self.parse_details_product)
#page 2 : scrape title from all url product
def parse_details_product(self, response):
yield{
'title': response.css('div.detail-wrap >h1::text').extract(),
}
#page 2 : scrape url photo (same page of title)
url_img = response.css('div.ui-image-viewer-thumb-wrap > a::attr(href)')[0].extract()
for url in url_img:
url_img = response.urljoin(url_img)
yield scrapy.Request(url=url_img, callback=self.parse_reviews)
#page 3 : scrape photo
def parse_reviews(self, response):
yield{
'img_hd_1': response.css('a > img::attr(src)').extract(),
}
#The result is good but not the structure, how to scrape data from multiple pages in the same row?
谢谢。