scrapy运行2蜘蛛

时间:2015-05-22 13:02:04

标签: python scrapy

如何连续运行2个蜘蛛?运行它运行第一个蜘蛛而不是第二个蜘蛛。有没有办法等待一个完成?:

from scrapy import cmdline

cmdline.execute("scrapy crawl spider1".split())

cmdline.execute("scrapy crawl spider2".split())

Edit1:我使用.wait()将其更改为:

spider1 = subprocess.Popen(cmdline.execute("scrapy crawl spider1".split()))
spider1.wait()

spider2 = subprocess.Popen(cmdline.execute("scrapy crawl spider2".split()))
spider2.wait()

我做错了,因为它只会运行第一个

Edit2:

Traceback (most recent call last):
  File "/usr/bin/scrapy", line 9, in <module>
    load_entry_point('Scrapy==0.24.6', 'console_scripts', 'scrapy')()
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 109, in execute
    settings = get_project_settings()
  File "/usr/lib/pymodules/python2.7/scrapy/utils/project.py", line 60, in get_project_settings
    settings.setmodule(settings_module_path, priority='project')
  File "/usr/lib/pymodules/python2.7/scrapy/settings/__init__.py", line 109, in setmodule
    module = import_module(module)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named settings
1

1 个答案:

答案 0 :(得分:2)

我会使用Subprocess,它具有.wait()功能。或者您可以在子进程中使用.call(),它会自动等待并打印它以使终端文本调用scrapy crawl

spider1 = subprocess.call(["scrapy", "crawl", "spider1"])
print spider1

spider2 = subprocess.call(["scrapy", "crawl", "spider2"])
print spider2

此方法将自动等待第一个蜘蛛完成,然后调用秒蜘蛛