码

Question

我有一个使用python和selenium的脚本来刮取谷歌搜索结果..它有效，但我正在寻找一个更好的解决方案，等待所有100搜索结果被提取

我使用此解决方案等待搜索完成

driver.wait.until(EC.presence_of_element_located(
    (By.ID, 'resultStats')))

这有效，但我需要获得100个搜索结果，所以我这样做

driver.get(driver.current_url+'&num=100')

但现在无法重复使用此行，因为元素ID已写入页面..

driver.wait.until(EC.presence_of_element_located(
    (By.ID, 'resultStats')))

相反，我使用此解决方案，但它不是一致的解决方案（如果请求需要超过5秒）

time.sleep(5)

码

url = 'https://www.google.com'
driver.get(url)

try:
    box = driver.wait.until(EC.presence_of_element_located(
        (By.NAME, 'q')))
    box.send_keys(query.decode('utf-8'))
    button = driver.wait.until(EC.element_to_be_clickable(
        (By.NAME, 'btnG')))
    button.click()
except TimeoutException:
    error('Box or Button not found in google.com')

try:
    driver.wait.until(EC.presence_of_element_located(
        (By.ID, 'resultStats')))
    driver.get(driver.current_url+'&num=100')

    # Need a better solution to wait until all results are loaded
    time.sleep(5)

    print driver.find_element_by_tag_name('body').get_attribute('innerHTML').encode('utf-8')
except TimeoutException:
    error('No results returned by Google. Could be HTTP 503 response')

Answer 1

class="g"不是在页面上等待某些内容的可靠而好的方式，这是完全正确的。您需要使用WebDriverWait class和特定条件等待。

在这种情况下，我等待from selenium.common.exceptions import StaleElementReferenceException from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC class wait_for_n_elements(object): def __init__(self, locator, count): self.locator = locator self.count = count def __call__(self, driver): try: count = len(EC._find_elements(driver, self.locator)) return count >= self.count except StaleElementReferenceException: return False（代表搜索结果）的元素数量将通过custom Expected Condition大于或等于100：

wait = WebDriverWait(driver, 10)
wait.until(wait_for_n_elements((By.CSS_SELECTOR, ".g"), 100)

用法：

Thread1

selenium等到重新加载html

码

1 个答案: