Question

我在 Docker 中遇到了 Scrapy 问题。在 docker 中，我的蜘蛛没有保存带有结果的输出文件。

这是我用来启动蜘蛛的脚本：

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from CARS_PL_source_1 import CARSPLsource1Spider

if __name__ == "__main__":
    process = CrawlerProcess(get_project_settings())
    process.crawl(CARSPLsource1Spider)
    process.start()

在我的管道内，我有这个代码来保存输出：

from scrapy.exporters import JsonItemExporter


class JsonPipeline(object):
    def __init__(self):
        self.file = open("file.json", 'wb', buffering=0)
        self.exporter = JsonItemExporter(self.file, encoding='utf-8', ensure_ascii=False)
        self.exporter.start_exporting()

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

    def close_spider(self, spider):
        self.exporter.finish_exporting()
        self.file.close()

我的 docker 文件如下所示：

FROM python:3

WORKDIR /usr/src/app
 
COPY car_prices_tool_scrapy/requirements.txt ./
 
RUN pip3 install --no-cache-dir -r requirements.txt

COPY car_prices_tool_scrapy .
 
CMD [ "python", "spiders/run_CARS_PL_source_1.py" ]

如果没有 docker，spider 会保存这个 file.json。在 docker 上运行时，我可以看到蜘蛛正在运行并在日志中获取结果，但我找不到我的文件。

当我执行 docker diff 时，我得到了很多临时文件、pycache 文件等，但是这个 file.json 无处可见。

我做错了什么？

Docker 不保存 Scrapy 蜘蛛的输出文件

0 个答案: