Question

我目前正在使用以下命令行参数的Scrapy：

scrapy crawl my_spider -o data.json

但是，我更喜欢在Python脚本中“保存”此命令。在https://doc.scrapy.org/en/latest/topics/practices.html之后，我有以下脚本：

import scrapy
from scrapy.crawler import CrawlerProcess

from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(ApkmirrorSitemapSpider)
process.start() # the script will block here until the crawling is finished

但是，从文档中我不清楚在-o data.json命令行参数的等效内容应该在脚本中。如何让脚本生成JSON文件？

Answer 1

您需要将FEED_FORMAT和FEED_URI添加到CrawlerProcess：

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'json',
'FEED_URI': 'data.json'
})

从具有文件输出的脚本运行Scrapy

1 个答案: