使用scrapy-splash配置DDS。错误:没有基础对象

时间:2016-10-25 17:36:46

标签: django django-dynamic-scraper scrapy-splash

LS,

我安装了Django-Dynamic-Scraper。我想通过Splash渲染Javascript。因此我安装了scrapy-splash并安装了docker splash图像。下图显示可以到达泊坞窗容器。

Splash docker container

然而,当我通过DDS测试时,它返回以下错误:

2016-10-25 17:06:00 [scrapy] INFO: Spider opened
2016-10-25 17:06:00 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-10-25 17:06:00 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-10-25 17:06:05 [scrapy] DEBUG: Crawled (200) <POST http://192.168.0.150:8050/render.html> (referer: None)
2016-10-25 17:06:06 [root] ERROR: No base objects found!
2016-10-25 17:06:06 [scrapy] INFO: Closing spider (finished)
2016-10-25 17:06:06 [scrapy] INFO: Dumping Scrapy stats:

执行时:

scrapy crawl my_spider -a id=1

我已配置DDS管理页面并选中了复选框以呈现javascript:

Admin configuration

我已经遵循了scrapy-splash的配置:

# ----------------------------------------------------------------------
# SPLASH SETTINGS
# https://github.com/scrapy-plugins/scrapy-splash#configuration
# --------------------------------------------------------------------
SPLASH_URL = 'http://192.168.0.150:8050/'

DSCRAPER_SPLASH_ARGS = {'wait': 3}

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

# This middleware is needed to support cache_args feature;
# it allows to save disk space by not storing duplicate Splash arguments
# multiple times in a disk request queue.
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

# If you use Scrapy HTTP cache then a custom cache storage backend is required.
# scrapy-splash provides a subclass
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

我假设使用正确的DDS / scrapy-splash配置,它会将所需的参数发送到splash docker容器进行渲染,这是否属实?

我错过了什么?我是否需要使用启动脚本调整蜘蛛?

0 个答案:

没有答案