抓取具有多种变体的项目,而不是列出或听写,但每行(多行)各

时间:2018-06-30 21:09:41

标签: python csv scrapy

尝试抓取产品的信息(标题,价格,变体等)并输出到csv。
因此,标题,价格和其他包含1个元素的数据应包含一个单元格,但具有多种变体的数据应为每行一个变体,例如spreadsheet
在我的代码列中,颜色名称,颜色图像url和颜色变化url都有很多变体,我希望每个新行的每个元素都代替一个单元格中的列表格式。 怎么办?
我的代码:

import scrapy
import string

class KapserSpider(scrapy.Spider):
    name = 'kapser2'
    allowed_domains = ['xpressprofil.no']
    start_urls = ['http://www.xpressprofil.no/sortiment/kapser-hodeplagg/']
def parse(self, response):
    items = response.xpath('//div[contains(@class, "listing")]')
    for item in items:
        first_url = item.xpath('.//a/@href').extract_first()
        absolute_url = response.urljoin(first_url)
        price = item.xpath('.//span[@class="listing_price "]/text()').extract_first()
        yield scrapy.Request(absolute_url, callback=self.parse_item, meta={'price': price})

    next_page_url = response.xpath('//li[@rel="next"]/a/@href').extract_first()
    absolute_next_page_url = response.urljoin(next_page_url)
    yield scrapy.Request(absolute_next_page_url, callback=self.parse)


def parse_item(self, response):
    item = {}
    item['title'] = response.xpath('//h3[@class="margin-top-0"]/text()').extract_first()
    item['price'] = response.meta['price']
    item['category'] = response.css('.breadcrumb li:nth-child(2) a::text').extract_first()
    item['sub_category'] = response.css('.breadcrumb li:nth-last-child(2) a::text').extract_first()
    item['image'] = response.xpath('//a[@class="thumbnail"]/img/@src').extract_first()
    item['url'] = response.url
    item['description'] = response.xpath('//div[contains(@class, "product-details")]//p/text()').extract_first()
    sale_price = map(string.strip, response.xpath('//th[contains(@class, "text-right")]/text()').extract())
    sale_price_normalize = [x.encode('utf-8') for x in sale_price]
    min_order_amount = map(string.strip, response.xpath('//td[contains(@class, "text-right")]/text()').extract())
    min_order_amount_normalize = [y.encode('utf-8') for y in min_order_amount]
    item['sales'] = dict(zip(sale_price_normalize, min_order_amount_normalize))
    item['color'] = response.xpath('//div[contains(@class, "product-details")]//a[@href[contains(., "htm")]]/img/@alt').extract()
    item['gtin_barcode'] = response.xpath('//div[@id="spec"]//td[contains(., "EAN")]/following-sibling::td[1]/text()').extract()
    item['brand'] = ""
    item['mpn'] = response.xpath('//div[@id="spec"]//td[contains(., "Artikkelnummer")]/following-sibling::td[1]/text()').extract()
    item['delivery_time'] = response.xpath('//div[@id="spec"]//td[contains(., "Leveransetid") or contains(., "Leveringstid")]/following-sibling::td[1]/text()').extract()
    #variations = response.xpath('//select[@data-name="Merking"]//option/text()').extract()

    yield item 

更新:是的,How can i export scraped data to csv file in the right format?已经解决,我的问题很相似,但是对我来说不起作用。我仍然在一行中输出数据[红色,绿色,黑色],而不是每个变体一行。 据我了解,问题在于在exporters.py中迭代抛出值,但我无法猜到哪里

from itertools import izip_longest
from scrapy.contrib.exporter import CsvItemExporter
from scrapy.conf import settings

class NewLineRowCsvItemExporter(CsvItemExporter):

    def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
        super(NewLineRowCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs)

    def export_item(self, item):
        if self._headers_not_written:
            self._headers_not_written = False
            self._write_headers_and_set_fields_to_export(item)

        fields = self._get_serialized_fields(item, default_value='',
                                             include_empty=True)
        values = list(self._build_row(x for _, x in fields))

        values = [
            (val[0] if len(val) == 1 and type(val[0]) in (list, tuple) else val)
            if type(val) in (list, tuple)
            else (val, )
            for val in values]

        multi_row = izip_longest(*values, fillvalue='')

        for row in multi_row:
            self.csv_writer.writerow(row)

csv中的输出数据格式:Basic bomullscaps med 5 paneler,Kapser&Hodeplagg ,, Kapser,http://www.xpressprofil.no/sortiment/kapser-hodeplagg/kapser/89545340-225584/basic-bomullscaps-med-5-paneler.htm ,,“ 16,22 kr” ,,“ {'1':'17,91',' 100':'17,05','250':'16,55','1000':'16,22'}“,” Hvit,Svart ensfarget,Marineblå,Kongeblå,Rød,Kaki,Grå,Eplegrønn,Oransje ,“ Gul”,11106604,// prodimg.unpr.io/itemimage/800/products/500/159637103c53ba.jpg,Basic bomullscaps med 5专家。 Bomull。 155-160克/平方米。
我在这里有列表和字典,也许就是问题所在?

0 个答案:

没有答案