我的蜘蛛通过SplashRequest爬网导致飞溅在执行一段时间后意外停止

时间:2019-03-28 03:12:43

标签: python scrapy scrapy-splash

我正在使用SplashRequest通过Lua脚本执行一些javascript代码。 如果我运行一个简短的URL列表,则一切正常,但是当列表中的URL超过约50个时,就会出现问题,这会导致我停止启动而没有通知我任何错误日志。

我正在使用来自docker的启动,我试图设置超时时间

docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash --max-timeout 300

我的蜘蛛也定义了超时时间

yield SplashRequest(u, endpoint="render.html", callback=self.parse,dont_filter=True, meta={
                                    "url": u,
                                    "Keyword" : kw,
                                    "splash": {"endpoint": "execute", "args": {"lua_source": self.script,'wait': 0.5, 'timeout': 3600}}
                                    })

我的启动日志的开始:

2019-03-27 04:02:18+0000 [-] Log opened.
2019-03-27 04:02:18.376478 [-] Splash version: 3.3.1
2019-03-27 04:02:18.380078 [-] Qt 5.9.1, PyQt 5.9.2, WebKit 602.1, sip 4.19.4, Twisted 18.9.0, Lua 5.2
2019-03-27 04:02:18.380331 [-] Python 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]
2019-03-27 04:02:18.380775 [-] Open files limit: 1048576
2019-03-27 04:02:18.380978 [-] Can't bump open files limit
2019-03-27 04:02:18.490614 [-] Xvfb is started: ['Xvfb', ':1118895823', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2019-03-27 04:02:18.796492 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2019-03-27 04:02:18.796865 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2019-03-27 04:02:18.993773 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=300.0
2019-03-27 04:02:18.994979 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2019-03-27 04:02:18.995776 [-] Site starting on 8050
2019-03-27 04:02:18.996131 [-] Starting factory <twisted.web.server.Site object at 0x7f57e820dcf8>
2019-03-27 04:02:18.996736 [-] Server listening on http://0.0.0.0:8050
2019-03-27 04:03:36.389957 [-] "172.17.0.1" - - [27/Mar/2019:04:03:36 +0000] "GET /robots.txt HTTP/1.1" 404 153 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
process 1: D-Bus library appears to be incorrectly set up; failed to read machine uuid: UUID file '/etc/machine-id' should contain a hex string of length 32, not length 0, with no other text
See the manual page for dbus-uuidgen to correct this issue.
qt.network.ssl: QSslSocket: cannot resolve SSLv2_client_method
qt.network.ssl: QSslSocket: cannot resolve SSLv2_server_method

0 个答案:

没有答案