当我尝试使用Frontera和scrapy记录爬行时,它给出了一个错误,说没有名为录音的模块,但是,我无法理解为什么它会出现,因为我已经按照{{{ 3}}。 请帮助,谢谢你。 追溯是:
2017-07-04 15:38:57 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: alexa)
2017-07-04 15:38:57 [scrapy.utils.log] INFO: Overridden settings: {'AUTOTHROTTLE_MAX_DELAY': 3.0, 'DOWNLOAD_MAXSIZE': 10485760, 'SPIDER_MODULES': ['alexa.spiders'], 'CONCURRENT_REQUESTS_PER_DOMAIN': 10, 'CONCURRENT_REQUESTS': 256, 'RANDOMIZE_DOWNLOAD_DELAY': False, 'RETRY_ENABLED': False, 'DUPEFILTER_CLASS': 'alexa.bloom_filter1.BLOOMDupeFilter', 'AUTOTHROTTLE_START_DELAY': 0.25, 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'alexa', 'AJAXCRAWL_ENABLED': True, 'COOKIES_ENABLED': False, 'SCHEDULER': 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler', 'DOWNLOAD_TIMEOUT': 120, 'AUTOTHROTTLE_ENABLED': True, 'NEWSPIDER_MODULE': 'alexa.spiders'}
2017-07-04 15:38:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.throttle.AutoThrottle']
Unhandled error in Deferred:
2017-07-04 15:38:57 [twisted] CRITICAL: Unhandled error in Deferred:
2017-07-04 15:38:57 [twisted] CRITICAL:
Traceback (most recent call last):
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl
six.reraise(*exc_info)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl
self.engine = self._create_engine()
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named recording
答案 0 :(得分:0)
我在official doc之后遇到了同样的问题。我在scrapinghub blogpost之后找到了一个解决方案。
问题是官方文档已被弃用。它使用了一个不再存在的中间件:
SPIDER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderSpiderMiddleware': 1000,
})
DOWNLOADER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderDownloaderMiddleware': 1000,
})
您需要使用recording
中间件,而不是使用scheduler
中间件。
SPIDER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 1000,
})
DOWNLOADER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 1000,
})