飞溅阻止JS执行

时间:2019-03-12 21:09:13

标签: javascript python scrapy splash

我尝试使用scrapy-splash渲染具有js交互作用的某些网站,但是如果我使用splash:wait(),则会阻止js执行并且某些验证失败:

2019-03-12 19:20:08.852022 [network] [139640539265848] GET http://127.0.0.1:17556/
2019-03-12 19:20:08.883925 [network-manager] Download error 403: unknown error (http://127.0.0.1:7054/)

但是我不使用splash:wait ..飞溅返回无效响应而没有等待JS验证..并且飞溅日志告诉我该请求是在返回飞溅响应之后执行的:

2019-03-12 19:33:12.165731 [events] {"maxrss": 226184, "rendertime": 0.6243276596069336, "active": 0, "timestamp": 1552419192, "qsize": 0, "client_ip": "172.17.0.10", "load": [0.6, 0.52, 0.51], "_id": 139640539265176, "path": "/execute", "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0", "status_code": 200, "method": "POST", "args": {"lua_source": "\n    function main(splash, args)\n        \n        splash.private_mode_enabled = false\n        assert(splash:go{\n            splash.args.url,\n            headers=splash.args.headers,\n            http_method=splash.args.http_method,\n            body=splash.args.body,\n        })\n        \n        local entries = splash:history()\n        local last_response = entries[#entries].response\n        print(\"leoooooooo\")\n        print(last_response)\n        return {\n            \n            html = splash:html(),\n        }\n    end", "proxy": "http://neoway:mp7Bs18ErS@179.61.205.166:60000", "url": "http://consultas.transparencia.mt.gov.br/pessoal/servidores_ativos_orgao/resultado_3.php?mes=1&ano=2017&orgao=7&ficha=125164849", "uid": 139640539265176, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0", "Accept-Encoding": "gzip, deflate", "Referer": "http://consultas.transparencia.mt.gov.br/pessoal/servidores_ativos_orgao/resultado_3.php?mes=1&ano=2017&orgao=7&ficha=125164849", "Host": "consultas.transparencia.mt.gov.br", "Accept-Language": "en-US,en;q=0.5"}, "cookies": [], "wait": 10}, "fds": 23}
2019-03-12 19:33:12.166175 [pool] SLOT 0 is available
2019-03-12 19:33:12.178595 [network] [139640539265176] GET http://127.0.0.1:4444/
2019-03-12 19:33:12.182496 [network] [139640539265176] GET http://127.0.0.1:4653/

我正在尝试渲染此Webseite

  

http://consultas.transparencia.mt.gov.br/pessoal/servidores_ativos_orgao/resultado_3.php?mes=1&ano=2017&orgao=7&ficha=125164849

lua脚本:

function main(splash, args)
  assert(splash:go(args.url))
  assert(splash:wait(5))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

还有其他方式等待JS执行吗?

谢谢

0 个答案:

没有答案