我有一个使用python和selenium的脚本来刮取谷歌搜索结果..它有效,但我正在寻找一个更好的解决方案,等待所有100
搜索结果被提取
我使用此解决方案等待搜索完成
driver.wait.until(EC.presence_of_element_located(
(By.ID, 'resultStats')))
这有效,但我需要获得100
个搜索结果,所以我这样做
driver.get(driver.current_url+'&num=100')
但现在无法重复使用此行,因为元素ID已写入页面..
driver.wait.until(EC.presence_of_element_located(
(By.ID, 'resultStats')))
相反,我使用此解决方案,但它不是一致的解决方案(如果请求需要超过5秒)
time.sleep(5)
url = 'https://www.google.com'
driver.get(url)
try:
box = driver.wait.until(EC.presence_of_element_located(
(By.NAME, 'q')))
box.send_keys(query.decode('utf-8'))
button = driver.wait.until(EC.element_to_be_clickable(
(By.NAME, 'btnG')))
button.click()
except TimeoutException:
error('Box or Button not found in google.com')
try:
driver.wait.until(EC.presence_of_element_located(
(By.ID, 'resultStats')))
driver.get(driver.current_url+'&num=100')
# Need a better solution to wait until all results are loaded
time.sleep(5)
print driver.find_element_by_tag_name('body').get_attribute('innerHTML').encode('utf-8')
except TimeoutException:
error('No results returned by Google. Could be HTTP 503 response')
答案 0 :(得分:3)
class="g"
不是在页面上等待某些内容的可靠而好的方式,这是完全正确的。您需要使用WebDriverWait
class和特定条件等待。
在这种情况下,我等待from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class wait_for_n_elements(object):
def __init__(self, locator, count):
self.locator = locator
self.count = count
def __call__(self, driver):
try:
count = len(EC._find_elements(driver, self.locator))
return count >= self.count
except StaleElementReferenceException:
return False
(代表搜索结果)的元素数量将通过custom Expected Condition大于或等于100:
wait = WebDriverWait(driver, 10)
wait.until(wait_for_n_elements((By.CSS_SELECTOR, ".g"), 100)
用法:
Thread1