Question

我在弹性beanstalk应用程序上有一个scrapy爬虫，我可以通过SSH运行，如下所示：

source /opt/python/run/venv/bin/activate
source /opt/python/current/env
cd /opt/python/current/app
scrapy crawl spidername

我想设置一个cronjob来为我运行这个。所以我遵循了here的建议。

我的setup.config文件如下所示：

container_commands:
  01_cron_hemnet:
    command: "cat .ebextensions/spider_cron.txt > /etc/cron.d/crawl_spidername && chmod 644 /etc/cron.d/crawl_spidername"
  leader_only: true

我的spider_cron.txt文件如下所示：

# The newline at the end of this file is extremely important.  Cron won't run without it.
* * * * * root sh /opt/python/current/app/runcrawler.sh &>/tmp/mycommand.log
# There is a newline here.

我的runcrawler.sh文件位于/opt/python/current/app/runcrawler.sh，看起来像这样

#!/bin/bash

cd /opt/python/current/app/
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl spidername

我可以导航到/etc/cron.d/并看到那里存在crawl_spidername。但是当我运行crontab -l或crontab -u root -l时，它表示没有crontab存在。

我没有记录日志错误，没有部署错误，我尝试输出cron的/tmp/mycommand.log文件从未创建过。似乎cronjob从未开始。

想法？

Answer 1

您的spider_cron.txt在 / opt / python / current / app / 之后但在 scrapy 之前有一个额外的空格。所以正在运行的命令只是一个文件夹“/ opt / python / current / app /”

此致

40 9 * * * root /opt/python/current/app/ scrapy crawl spidername > /dev/null

应该是

40 9 * * * root /opt/python/current/app/scrapy crawl spidername > /dev/null

完全输入“ / opt / python / current / app / scrapy抓取蜘蛛侠名称”会启动您的抓取工具吗？

Elastic Beanstalk上的Cronjob没有运行

1 个答案: