Scrapy:如何以特定的json格式输出项目

时间:2017-03-26 00:46:22

标签: python json scrapy

我以json格式输出抓取的数据。 默认scrapy导出器以json格式输出dict列表。 项目类型如下:

[{"Product Name":"Product1", "Categories":["Clothing","Top"], "Price":"20.5", "Currency":"USD"},
{"Product Name":"Product2", "Categories":["Clothing","Top"], "Price":"21.5", "Currency":"USD"},
{"Product Name":"Product3", "Categories":["Clothing","Top"], "Price":"22.5", "Currency":"USD"},
{"Product Name":"Product4", "Categories":["Clothing","Top"], "Price":"23.5", "Currency":"USD"}, ...]

但我想以这样的特定格式导出数据:

{
"Shop Name":"Shop 1",
"Location":"XXXXXXXXX",
"Contact":"XXXX-XXXXX",
"Products":
[{"Product Name":"Product1", "Categories":["Clothing","Top"], "Price":"20.5", "Currency":"USD"},
{"Product Name":"Product2", "Categories":["Clothing","Top"], "Price":"21.5", "Currency":"USD"},
{"Product Name":"Product3", "Categories":["Clothing","Top"], "Price":"22.5", "Currency":"USD"},
{"Product Name":"Product4", "Categories":["Clothing","Top"], "Price":"23.5", "Currency":"USD"}, ...]
}

请告诉我任何解决方案。 谢谢。

2 个答案:

答案 0 :(得分:3)

scrapy网页here上有详细记载。

from scrapy.exporters import JsonItemExporter


class ItemPipeline(object):

    file = None

    def open_spider(self, spider):
        self.file = open('item.json', 'w')
        self.exporter = JsonItemExporter(self.file)
        self.exporter.start_exporting()

    def close_spider(self, spider):
        self.exporter.finish_exporting()
        self.file.close()

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

这将创建一个包含您的项目的json文件。

答案 1 :(得分:1)

我试图导出漂亮的印刷JSON,这对我有用。

我创建了一个如下所示的管道:

class JsonPipeline(object):

    def open_spider(self, spider):
        self.file = open('your_file_name.json', 'wb')
        self.file.write("[")

    def close_spider(self, spider):
        self.file.write("]")
        self.file.close()

    def process_item(self, item, spider):
        line = json.dumps(
            dict(item),
            sort_keys=True,
            indent=4,
            separators=(',', ': ')
        ) + ",\n"

        self.file.write(line)
        return item

它类似于scrapy docs https://doc.scrapy.org/en/latest/topics/item-pipeline.html中的示例,除了它打印每个JSON属性缩进和新行。

请参阅此处有关漂亮打印的部分https://docs.python.org/2/library/json.html