我是Scrapy的新手,我正在尝试构建自己的Downloader Middleware,以便通过代理来抓取网络。我收到了这个错误:
Traceback (most recent call last):
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/core/engine.py", line 68, in __init__
self.downloader = downloader_cls(crawler)
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named downloaders.downloader_middlewares.proxy_connect
此错误是由于Scrapy
无法找到我的中间件。我不确定这是否是由于我没有设置正确的路径或者我的中间件出错了。
这是我的项目结构:
/chisel
__init__.py
pipelines.py
items.py
settings.py
/downloaders
__init__.py
/downloader_middlewares
__init__.py
proxy_connect.py
/resources
config.json
/spiders
__init__.py
craiglist_spider.py
/spider_middlewares
__init__.py
/resources
craigslist.json
scrapy.cfg
在我的settings.py中,我有
DOWNLOADER_MIDDLEWARES = {
'downloaders.downloader_middlewares.proxy_connect.ProxyConnect': 100,
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110
}
答案 0 :(得分:1)
根据docs,路径应该包含项目('myproject.middlewares.CustomDownloaderMiddleware'
),我认为它应该是:
'chisel.downloaders.downloader_middlewares.proxy_connect.ProxyConnect': 100