尝试记录scrapy爬行时没有名为录制的模块

时间:2017-07-04 10:24:23

标签: python-2.7 scrapy web-crawler recording frontera

当我尝试使用Frontera和scrapy记录爬行时,它给出了一个错误,说没有名为录音的模块,但是,我无法理解为什么它会出现,因为我已经按照{{{ 3}}。 请帮助,谢谢你。 追溯是:

2017-07-04 15:38:57 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: alexa)
2017-07-04 15:38:57 [scrapy.utils.log] INFO: Overridden settings: {'AUTOTHROTTLE_MAX_DELAY': 3.0, 'DOWNLOAD_MAXSIZE': 10485760, 'SPIDER_MODULES': ['alexa.spiders'], 'CONCURRENT_REQUESTS_PER_DOMAIN': 10, 'CONCURRENT_REQUESTS': 256, 'RANDOMIZE_DOWNLOAD_DELAY': False, 'RETRY_ENABLED': False, 'DUPEFILTER_CLASS': 'alexa.bloom_filter1.BLOOMDupeFilter', 'AUTOTHROTTLE_START_DELAY': 0.25, 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'alexa', 'AJAXCRAWL_ENABLED': True, 'COOKIES_ENABLED': False, 'SCHEDULER': 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler', 'DOWNLOAD_TIMEOUT': 120, 'AUTOTHROTTLE_ENABLED': True, 'NEWSPIDER_MODULE': 'alexa.spiders'}
2017-07-04 15:38:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.throttle.AutoThrottle']
Unhandled error in Deferred:
2017-07-04 15:38:57 [twisted] CRITICAL: Unhandled error in Deferred:

2017-07-04 15:38:57 [twisted] CRITICAL:
Traceback (most recent call last):
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl
    six.reraise(*exc_info)
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl
    self.engine = self._create_engine()
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named recording

1 个答案:

答案 0 :(得分:0)

我在official doc之后遇到了同样的问题。我在scrapinghub blogpost之后找到了一个解决方案。

问题是官方文档已被弃用。它使用了一个不再存在的中间件:

SPIDER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderSpiderMiddleware': 1000,
})

DOWNLOADER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderDownloaderMiddleware': 1000,

})

您需要使用recording中间件,而不是使用scheduler中间件。

SPIDER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 1000,
})

DOWNLOADER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 1000,
})