我试图每分钟从Window的Task Scheduler中的bat文件运行我的Scrapy的python脚本。
然而,python脚本以某种方式没有退出,它阻止了任务调度程序的所有未来任务启动。
所以,我的问题是,
如何在蜘蛛完成运行后优雅地退出我的Scrapy脚本?
遇到异常时如何退出Scrapy脚本,尤其是ReactorNotRunning错误?
提前全部谢谢。
这是我运行python脚本的bat文件
@echo off
python "C:\Scripts\start.py"
pause
这是我的python脚本
from cineplex.spiders import seatings_spider as seat
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import sys
import time
from twisted.internet import reactor, defer
def crawl_all_showtimes():
# Create a CrawlerRunner instance to manage multiple spider simultaneously
runner = CrawlerRunner()
# Check folder for today
directory_for_today = utils.create_dir_for_today(PARENT_DIR)
# Get all cinema id and names first
cinema_dict = utils.get_all_cinemas()
# Prepare for crawling
crawl_showtimes_helper(directory_for_today, cinema_dict, runner)
# Start Crawling for Showtimes
reactor.run()
# Helps to run multiple ShowTimesSpiders sequentially
@defer.inlineCallbacks
def crawl_showtimes_helper(output_dir, cinema_dict, runner):
# Iterate through all cinema to get show timings
for cinema_id, cinema_name in cinema_dict.iteritems():
yield runner.crawl(st.ShowTimesSpider, cinema_id=cinema_id, cinema_name=cinema_name, output_dir=output_dir )
reactor.stop()
if __name__ == "__main__":
# Turns on Scrapy Logging
configure_logging()
# Collect all Seatings
crawl_all_seatings()
答案 0 :(得分:-1)
程序的主线程阻塞了一些Scrapy线程。所以在你的主程序中使用:
import sys;
sys.exit()