Question

我试图每分钟从Window的Task Scheduler中的bat文件运行我的Scrapy的python脚本。

然而，python脚本以某种方式没有退出，它阻止了任务调度程序的所有未来任务启动。

所以，我的问题是，

如何在蜘蛛完成运行后优雅地退出我的Scrapy脚本？
遇到异常时如何退出Scrapy脚本，尤其是ReactorNotRunning错误？

提前全部谢谢。

这是我运行python脚本的bat文件

@echo off
python "C:\Scripts\start.py"
pause

这是我的python脚本

from cineplex.spiders import seatings_spider as seat
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import sys
import time
from twisted.internet import reactor, defer


def crawl_all_showtimes():
    # Create a CrawlerRunner instance to manage multiple spider simultaneously
    runner = CrawlerRunner()

    # Check folder for today
    directory_for_today = utils.create_dir_for_today(PARENT_DIR)

    # Get all cinema id and names first
    cinema_dict = utils.get_all_cinemas()

    # Prepare for crawling
    crawl_showtimes_helper(directory_for_today, cinema_dict, runner)

    # Start Crawling for Showtimes
    reactor.run()


# Helps to run multiple ShowTimesSpiders sequentially
@defer.inlineCallbacks
def crawl_showtimes_helper(output_dir, cinema_dict, runner):
    # Iterate through all cinema to get show timings
    for cinema_id, cinema_name in cinema_dict.iteritems():
        yield runner.crawl(st.ShowTimesSpider, cinema_id=cinema_id,     cinema_name=cinema_name, output_dir=output_dir )
    reactor.stop()

if __name__ == "__main__":

    # Turns on Scrapy Logging
    configure_logging()

    # Collect all Seatings
    crawl_all_seatings()

Answer 1

程序的主线程阻塞了一些Scrapy线程。所以在你的主程序中使用：

import sys;
sys.exit()

如何在蜘蛛停止爬行或遇到异常后退出Scrapy Python脚本？

1 个答案: