I am trying to make a Scrapy custom project command to run spiders. I found Register commands via setup.py entry points and did the following:
mkdir commands
cd commands
Created the command file crawlall.py
:
from scrapy.commands import ScrapyCommand
from scrapy.utils.project import get_project_settings
from scrapy.crawler import Crawler
class Command(ScrapyCommand):
requires_project = True
def syntax(self):
return '[options]'
def short_desc(self):
return 'Runs all of the spiders'
def run(self, args, opts):
settings = get_project_settings()
for spider_name in self.crawler.spiders.list():
crawler = Crawler(settings)
crawler.configure()
spider = crawler.spiders.create(spider_name)
crawler.crawl(spider)
crawler.start()
self.crawler.start()
Added COMMANDS_MODULE = 'myprojectname.commands'
to the settings.py
.
Created the setup.py
:
from setuptools import setup, find_packages
setup(name='scrapy-mymodule',
entry_points={
'scrapy.commands': [
'crawlall=cnblogs.commands:crawlall',
],
},
)
Ran the project command with scrapy crawlall
, which threw the following error:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 9, in <module>
load_entry_point('Scrapy==1.0.0rc2', 'console_scripts', 'scrapy')()
File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 122, in execute
cmds = _get_commands_dict(settings, inproject)
File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 50, in _get_commands_dict
cmds.update(_get_commands_from_module(cmds_module, inproject))
File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 29, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 20, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/utils/misc.py", line 63, in walk_modules
mod = import_module(path)
File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named commands
What should I do? Where is my mistake?
答案 0 :(得分:2)
In order to make a module detectable, add a __init__.py
file in the commands
directory:
> pwd # make sure that you are in commands directory
.../commands/
> touch __init__.py # create __init__.py
See more info in another SO thread: What is __init__.py
for?
答案 1 :(得分:0)
A directory with a Python script does not an importable module maketh: you need to add an __init__.py
file to the commands
directory as Python documentation on modules and packages explains:
The __init__.py files are required to make Python treat the directories as containing packages....
The __init__.py
file can be empty.
Also, commands
needs to be in a directory on sys.path
if it is not already for Python to find it, as the aforementioned documentation further explains:
When importing [a] package, Python searches through the directories on sys.path looking for the package subdirectory.
The following Python snippet will display your sys.path
:
import sys
sys.path
Lastly, read a particularly relevant SO answer in the thread to which Jon referred you for more information.
答案 2 :(得分:0)
官方文档说[1],您已经编写了命令类的完整路径:
setup(name='scrapy-mymodule',
entry_points={
'scrapy.commands': [
'crawlall=cnblogs.commands.crawlall:Command',
],
},
)
其中:
Command
:您的 command 类-> Command(ScrapyCommand):
crawlall
:命令类所在的* .py文件。您还可以通过添加一个外部库来添加Scrapy命令 库setup.py入口点中的scrapy.commands部分 文件。
以下示例添加了my_command命令: 从setuptools导入设置,find_packages
setup(name='scrapy-mymodule', entry_points={ 'scrapy.commands': [ 'my_command=my_scrapy_module.commands:MyCommand', ], }, )