我有以下代码
import scrapy
import re
class NamePriceSpider(scrapy.Spider):
name = 'namePrice'
start_urls = [
'https://www.cotodigital3.com.ar/sitios/cdigi/browse/'
]
def parse(self, response):
all_category_products = response.xpath('//*[@id="products"]')
for product in all_category_products:
name = product.xpath('//div[@class="descrip_full"]/text()').extract()
price = product.xpath('//span[@class="atg_store_productPrice" and not(@style)]/span[@class '
'="atg_store_newPrice"]/text() | //span[@class="price_discount"]/text()').re(
r'\$\d{'
r'1,'
r'5}(?:['
r'.,'
r']\d{'
r'3})*('
r'?:[., '
r']\d{2})*')
yield {'name': name,
'price': price}
next_page = response.xpath('//a[@title = "Siguiente"]/@href').extract_first()
next_page = response.urljoin(next_page)
if next_page:
yield scrapy.Request(url=next_page, callback=self.parse)
效果很好,可以在超市网站的多个页面中刮取产品名称和价格。我遇到的问题是,当我将所有信息输出到json文件中时,有不同的结构,例如{“ name”:[“ a”,“ b”,“ c”],“ price”:[“ 10 “,” 20,“ 30”]}(一页)和{“ name”:[“ d”,“ f”,“ g”],“ price”:[“ 40”,“ 50,” 60“]}对于其他页面。我希望所有页面都有一个结构,这样更容易迭代:{“ name”:[“ a”,“ b”,“ c”,“ d”,“ f”,“ g”], “价格”:[“ 10”,“ 20,” 30“,” 40“,” 50,“ 60”]}。有没有办法做到这一点?