我有一个功能性的scrapy项目,然后我决定清理它。为了做到这一点,我将我的数据库模块从我的项目的scrapy部分中取出,我不能再包含它了。现在项目看起来像这样:
myProject/
database/
__init__.py
model.py
databaseFactory.py
myScrapy/
__init__.py
settings.py
myScrapy/
__init__.py
pipeline.py
spiders/
spiderA.py
spiderB.py
api/
__init__.py
config/
__init__.py
(仅显示与我的问题相关的文件) 我想在scrapy中使用databaseFactory。
我在.bashrc中添加了以下几行:
PYTHONPATH=$PYTHONPATH:my/path/to/my/project
export PYTHONPATH
所以当启动ipython时我可以做以下事情:
In [1]: import database.databaseFactory as databaseFactory
In [2]: databaseFactory
Out[2]: <module 'database.databaseFactory' from '/my/path/to/my/project/database/databaseFactory.pyc'>
但是...
当我尝试使用
启动废料时sudo scrapy crawl spiderName 2> error.log
我可以享受以下信息:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 60, in run
self.crawler_process.start()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 92, in start
if self.start_crawling():
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 124, in start_crawling
return self._start_crawler() is not None
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 139, in _start_crawler
crawler.configure()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 47, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 65, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 42, in load_object
raise ImportError("Error loading object '%s': %s" % (path, e))
ImportError: Error loading object 'myScrapy.pipelines.QueueExportPipe': No module named database.databaseFactory
为什么scrapy会忽略我的PYHTONPATH?我现在该怎么办?我真的不想在我的代码中使用sys.path.append()
答案 0 :(得分:0)
你必须告诉python你的PYTHONPATH:
export PYTHONPATH=/path/to/myProject/
然后运行scrapy:
sudo scrapy crawl spiderName 2> error.log
答案 1 :(得分:0)
默认情况下,在使用sudo启动命令时,不使用普通上下文,因此忘记了PYTHONPATH。要使用sudo进行PYTHONPATH,请按照以下步骤操作:
答案 2 :(得分:-1)
使用“sys.path.append()”有什么问题?我尝试了许多其他方法,并确定“scrapy”不支持用户定义包的“$ PYTHONPATH”。我怀疑它在框架通过查找阶段后加载目录。但我尝试了“sys.path.append()”,它正在运行。
君