Question

我使用selenium python webdriver浏览某些页面。我想在加载和执行任何其他Javascript代码之前将javascript代码注入页面。另一方面，我需要将我的JS代码作为该页面的第一个JS代码来执行。有没有办法通过Selenium做到这一点？

我用Google搜索了几个小时，但我找不到合适的答案！

Answer 1

如果您希望在浏览器解析并执行之前将某些内容注入页面的html中，我建议您使用Mitmproxy之类的代理。

Answer 2

如果您无法修改页面内容，则可以使用代理，或在浏览器中安装的扩展程序中使用内容脚本。在selenium中执行它你会编写一些代码将脚本作为现有元素的子代之一注入，但是在加载页面之前你无法运行它（当你的驱动程序的get()调用返回时。）

String name = (String) ((JavascriptExecutor) driver).executeScript(
    "(function () { ... })();" ...

文档未指定代码开始执行的时刻。您可以在DOM开始加载之前使用它，以便保证可能只能通过代理或扩展内容脚本路由来满足。

如果您可以使用最小线束检测页面，则可能会检测到特殊URL查询参数的存在并加载其他内容，但您需要使用内联脚本执行此操作。伪代码：

 <html>
    <head>
       <script type="text/javascript">
       (function () {
       if (location && location.href && location.href.indexOf("SELENIUM_TEST") >= 0) {
          var injectScript = document.createElement("script");
          injectScript.setAttribute("type", "text/javascript");

          //another option is to perform a synchronous XHR and inject via innerText.
          injectScript.setAttribute("src", URL_OF_EXTRA_SCRIPT);
          document.documentElement.appendChild(injectScript);

          //optional. cleaner to remove. it has already been loaded at this point.
          document.documentElement.removeChild(injectScript);
       }
       })();
       </script>
    ...

Answer 3

从1.0.9版开始，selenium-wire已具有修改请求响应的功能。下面是此功能的示例，可在脚本到达Web浏览器之前将其注入页面。

import os
from seleniumwire import webdriver
from gzip import compress, decompress
from urllib.parse import urlparse

from lxml import html
from lxml.etree import ParserError
from lxml.html import builder

script_elem_to_inject = builder.SCRIPT('alert("injected")')

def inject(req, req_body, res, res_body):
    # various checks to make sure we're only injecting the script on appropriate responses
    # we check that the content type is HTML, that the status code is 200, and that the encoding is gzip
    if res.headers.get_content_subtype() != 'html' or res.status != 200 or res.getheader('Content-Encoding') != 'gzip':
        return None
    try:
        parsed_html = html.fromstring(decompress(res_body))
    except ParserError:
        return None
    try:
        parsed_html.head.insert(0, script_elem_to_inject)
    except IndexError: # no head element
        return None
    injected.append((req, req_body, res, res_body, parsed_html))
    return compress(html.tostring(parsed_html))

drv = webdriver.Firefox(seleniumwire_options={'custom_response_handler': inject})
drv.header_overrides = {'Accept-Encoding': 'gzip'} # ensure we only get gzip encoded responses

Answer 4

Selenium 现在已支持 Chrome Devtools Protocol (CDP) API，因此，在每次加载页面时执行脚本非常容易。这是一个示例代码：

driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': 'alert("Hooray! I did it!")'})

它将为每个页面加载执行该脚本。更多相关信息，请访问：

Selenium 文档：https://www.selenium.dev/documentation/en/support_packages/chrome_devtools/
Chrome Devtools 协议文档：https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument

Answer 5

现在有了https://pypi.org/project/selenium-wire/，它使得访问/修改所有请求相对容易。

Answer 6

所以我知道已经有好几年了，但是我找到了一种无需修改网页内容且无需使用代理的方法！我使用的是nodejs版本，但大概API与其他语言也是一致的。您想要做的如下

const {Builder, By, Key, until, Capabilities} = require('selenium-webdriver');
const capabilities = new Capabilities();
cap.setPageLoadStrategy('eager'); // Options are 'eager', 'none', 'normal'
let driver = await new Builder().forBrowser('firefox').setFirefoxOptions(capabilities).build();
await driver.get('http://example.com');
driver.executeScript(\`
  console.log('hello'
\`)

“渴望”选项对我有用。您可能需要使用“无”选项。文档：https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/lib/capabilities_exports_PageLoadStrategy.html

编辑：请注意，'eager'选项尚未在Chrome中实现...

Selenium：在加载/执行页面的任何其他脚本之前如何在页面中注入/执行Javascript？

6 个答案: