我有一个连接Django和Scrapy的项目,我希望通过Django管理命令启动蜘蛛爬行。想法是通过cron定期运行它。我正在使用Django 1.11,Python 3.5和Scrapy 1.5
这里是'〜/ djangoscrapy / src / app / management / commands / run_sp.py'文件中自定义管理命令的代码
from django.core.management.base import BaseCommand
from scrapy.cmdline import execute
import os
from django.conf import settings
os.chdir(settings.CRAWLER_PATH)
class Command(BaseCommand):
def run_from_argv(self, argv):
print ('In run_from_argv')
self._argv = argv[:]
return self.execute()
def handle(self, *args, **options):
execute(self._argv[1:])
当我运行$ python manage.py run_sp crawl usc
时我收到此错误......
In run_from_argv
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/home/greendot/lib/python2.7/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/home/greendot/lib/python2.7/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/greendot/webapps/scraper3/src/app/management/commands/run_sp.py", line 15, in run_from_argv
return self.execute()
File "/home/greendot/lib/python2.7/django/core/management/base.py", line 314, in execute
if options['no_color']:
KeyError: u'no_color'
我的项目结构如下
SRC
├── app
│ ├── __init__.py
│ ├── admin.py
│ ├── management
│ │ └── commands
│ │ ├── __init__.py
│ │ ├── run_sp.py
│ ├── models.py
│ └── views.py
├── example_bot
│ ├── dbs
│ ├── example_bot
│ │ ├── __init__.py
│ │ ├── items.py
│ │ ├── middlewares.py
│ │ ├── pipelines.py
│ │ ├── settings.py
│ │ └── spiders
│ │ ├── __init__.py
│ │ ├── __pycache__
│ │ └── usc.py
│ └── scrapy.cfg
├── example_project
│ ├── __init__.py
│ ├── __pycache__
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── manage.py
我已将下面的行添加到我的Django设置文件中,以便在执行管理命令时,它位于&#39; example_bot&#39;目录,因为'scrapy crawl&#39;命令仅在scrapy项目目录中可用,而不在BASE_DIR中。
CRAWLER_PATH = os.path.join(BASE_DIR, 'example_bot/')
我似乎无法让这个工作,所以任何帮助都非常感激