Question

我有一个页面，我需要自动执行一些任务并删除一些数据，但是该页面在加载后将JS注入一些数据即可运行一些JS；我无法拦截（无论如何格式都不对），我希望找到一种快速且不占用内存的解决方案。

我尝试自己获取脚本并使用一些无头驱动程序（即phantomJs）执行脚本，但它没有更新页面源，我不确定如何从中获取更新的DOM

var page = GetWebPage(url);
var scripts = page.Html.QuerySelectorAll("script");

var phantomDriver = new PhantomJSDriver(PhantomJSDriverService.CreateDefaultService(Directory.GetCurrentDirectory()));
phantomDriver.Navigate().GoToUrl(url);

foreach (var script in scripts)
    phantomDriver.ExecuteScript(script.InnerText);

var at = phantomDriver.PageSource;

Answer 1

您可以使用“等待”。根据{{3}}，Selenium具有隐式和显式等待。下面的示例使用显式等待。

要使用显式等待，请使用WebDriverWait和ExpectedConditions。我不确定您使用的是哪种语言，但这是python中的示例。这会在尝试捕获块中使用WebDriverWait，从而使timeout秒达到指定的ExpectedConditions。截至2019年6月，条件适用于：

python中的示例代码：

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url = 'https://stackoverflow.com/questions/56724178/executing-page-scripts-before-retrieving-its-contents'
target = (By.XPATH, "//div[@class='gravatar-wrapper-32']")
timeout = 20  # Allow max 20 seconds to find the target

browser = webdriver.Chrome()
browser.get(url)
try:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located(target))
except TimeoutException:
    print("Timed out waiting for page to load")
    browser.quit()

重要的位在try和except之间，您可以对其进行修改以使用您感兴趣的特定“预期条件”。

检索页面内容之前执行页面脚本

1 个答案: