运行Web搜寻器时,我遇到了Splash问题。首先,当我在docker中运行docker run -p 8050:8050 scrapinghub/splash
时,我得到了这个(我在Windows 10家庭版上工作时使用docker toolbox):
2019-05-16 15:21:49+0000 [-] Log opened.
2019-05-16 15:21:49.752049 [-] Splash version: 3.3.1
2019-05-16 15:21:49.753669 [-] Qt 5.9.1, PyQt 5.9.2, WebKit 602.1, sip 4.19.4, Twisted 18.9.0, Lua 5.2
2019-05-16 15:21:49.753925 [-] Python 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]
2019-05-16 15:21:49.754147 [-] Open files limit: 1048576
2019-05-16 15:21:49.754467 [-] Can't bump open files limit
2019-05-16 15:21:49.860067 [-] Xvfb is started: ['Xvfb', ':1506361592', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2019-05-16 15:21:49.958484 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2019-05-16 15:21:49.958882 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2019-05-16 15:21:50.108753 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2019-05-16 15:21:50.110291 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2019-05-16 15:21:50.111203 [-] Site starting on 8050
2019-05-16 15:21:50.111508 [-] Starting factory <twisted.web.server.Site object at 0x7fc868f8dcc0>
2019-05-16 15:21:50.112315 [-] Server listening on http://0.0.0.0:8050
我已经在我的settings.py中正确设置了中间件和其他软件:
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPLASH_URL = 'http://localhost:8050'
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
但是,当我运行搜寻器时,会收到以下日志:
2019-05-16 17:35:10 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://localhost:8050/robots.txt> (failed 1 times): Connection was refused by other side: 10061: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée..
2019-05-16 17:35:11 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://localhost:8050/robots.txt> (failed 2 times): Connection was refused by other side: 10061: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée..
2019-05-16 17:35:12 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://localhost:8050/robots.txt> (failed 3 times): Connection was refused by other side: 10061: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée..
2019-05-16 17:35:12 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://localhost:8050/robots.txt>: Connection was refused by other side: 10061: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée..
Traceback (most recent call last):
File "C:\Users\coppe\Anaconda3\envs\scrapyEnv\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 10061: Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée..
抱歉,法语部分,但基本上说目标计算机已明确拒绝连接。我猜docker出了点问题,因为当我在网络浏览器中搜索http://localhost:8050
时,我什么也没得到(连接失败)。
有人可以帮助我解决这个问题吗?