我正在将python硒与chrome headless和browsermob代理一起使用,以打印出加载页面时启动的所有连接。我对作为utag.js之类的跟踪标签的一部分发起的每个连接特别感兴趣。问题是使用脚本时看不到它们,但是当我手动浏览页面时,却在浏览器的开发人员控制台中看到了它们。我怀疑我缺少可以触发JS的东西,但无法弄清楚是什么。
这是我的脚本:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import os
import subprocess
from browsermobproxy import Server
proxy_options = {'port': 8888}
server= Server(path="/home/ubuntu/findanalytics/browsermob-proxy-2.1.4/bin/browsermob-proxy", options=proxy_options)
server.start()
proxy= server.create_proxy()
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
# next one would be possible via chrome options too but does not seem to work
# it is needed to make HTTPS visible in BMP
desired_capabilities = {"acceptInsecureCerts":True}
chrome_driver = os.getcwd()+"/chromedriver"
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=chrome_driver,desired_capabilities=desired_capabilities)
proxy.new_har("something")
driver.get("https://www.salt.ch")
for connection in proxy.har['log']['entries']:
print connection['request']['url']
server.stop()
driver.quit()
有人可以指出一个可能的解决方案吗?我搜索了一段时间,但没有找到任何……也许我只是使用了错误的搜索词(此处仅以salt.ch为例,因为它们使用了utag.js)
更新
如果我通过在driver.get之后添加以下内容来搜索该页面上的内容:
# search something to play with JS
searchstring = "samsung"
searchfield = driver.find_element_by_name("q")
searchfield.send_keys(searchstring)
searchbtn = driver.find_element_by_id("field-search-submit")
searchbtn.click()
# end
然后我看到utag.js如何启动。但是,我想在页面加载时看到它。有想法吗?
“解决方案”
好的,我已经解决了。这很丑陋,我还希望有一个真正的解决方案,但是现在我看到了联系。我将其发布在此处,以防其他人遇到此问题。但是我也希望比我更熟练的人可以提供更好的解决方案。现在,我删除了搜索部分,并在driver.get(丑陋)之后增加了10秒钟的睡眠时间。因此,完整的,可以正常工作的脚本如下所示:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import os
import subprocess
from browsermobproxy import Server
import time
PICPATH="/home/ubuntu/screenshots/"
proxy_options = {'port': 8888}
server= Server(path="/home/ubuntu/findanalytics/browsermob-proxy-2.1.4/bin/browsermob-proxy", options=proxy_options)
server.start()
proxy= server.create_proxy()
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
# next one would be possible via chrome options too but does not seem to work
# it is needed to make HTTPS visible in BMP
desired_capabilities = {"acceptInsecureCerts":True}
chrome_driver = os.getcwd()+"/chromedriver"
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=chrome_driver,desired_capabilities=desired_capabilities)
proxy.new_har("something")
driver.get("https://www.salt.ch")
time.sleep(10)
for connection in proxy.har['log']['entries']:
print connection['request']['url']
server.stop()
driver.quit()