我尝试使用scrapy-splash渲染具有js交互作用的某些网站,但是如果我使用splash:wait()
,则会阻止js执行并且某些验证失败:
2019-03-12 19:20:08.852022 [network] [139640539265848] GET http://127.0.0.1:17556/
2019-03-12 19:20:08.883925 [network-manager] Download error 403: unknown error (http://127.0.0.1:7054/)
但是我不使用splash:wait
..飞溅返回无效响应而没有等待JS验证..并且飞溅日志告诉我该请求是在返回飞溅响应之后执行的:
2019-03-12 19:33:12.165731 [events] {"maxrss": 226184, "rendertime": 0.6243276596069336, "active": 0, "timestamp": 1552419192, "qsize": 0, "client_ip": "172.17.0.10", "load": [0.6, 0.52, 0.51], "_id": 139640539265176, "path": "/execute", "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0", "status_code": 200, "method": "POST", "args": {"lua_source": "\n function main(splash, args)\n \n splash.private_mode_enabled = false\n assert(splash:go{\n splash.args.url,\n headers=splash.args.headers,\n http_method=splash.args.http_method,\n body=splash.args.body,\n })\n \n local entries = splash:history()\n local last_response = entries[#entries].response\n print(\"leoooooooo\")\n print(last_response)\n return {\n \n html = splash:html(),\n }\n end", "proxy": "http://neoway:mp7Bs18ErS@179.61.205.166:60000", "url": "http://consultas.transparencia.mt.gov.br/pessoal/servidores_ativos_orgao/resultado_3.php?mes=1&ano=2017&orgao=7&ficha=125164849", "uid": 139640539265176, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0", "Accept-Encoding": "gzip, deflate", "Referer": "http://consultas.transparencia.mt.gov.br/pessoal/servidores_ativos_orgao/resultado_3.php?mes=1&ano=2017&orgao=7&ficha=125164849", "Host": "consultas.transparencia.mt.gov.br", "Accept-Language": "en-US,en;q=0.5"}, "cookies": [], "wait": 10}, "fds": 23}
2019-03-12 19:33:12.166175 [pool] SLOT 0 is available
2019-03-12 19:33:12.178595 [network] [139640539265176] GET http://127.0.0.1:4444/
2019-03-12 19:33:12.182496 [network] [139640539265176] GET http://127.0.0.1:4653/
我正在尝试渲染此Webseite
lua脚本:
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
还有其他方式等待JS执行吗?
谢谢