WebDriver等待CSS Selector查找元素

时间:2018-11-28 20:06:36

标签: python css python-3.x selenium web-scraping

我想使用Python 3检索此网页的机票价格:https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o

起初我收到一个错误,经过几个小时我才意识到是由于我没有给webdriver足够的时间来加载所有元素的事实。因此,为了确保有足够的时间,我像这样添加了一个time.sleep:

time.sleep(1)

这使它起作用!但是,我已阅读并被建议不要使用此解决方案,而应改用WebDriverWait。因此,经过数小时和几个教程之后,我一直试图找出WebDriverWait应该等待的确切CSS类。

我想我最接近的是:

WebDriverWait(d, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price")))

关于我想念的东西有什么想法吗?

2 个答案:

答案 0 :(得分:1)

您可以使用css attribute =值选择器进行定位,或者,如果该值是动态的,则可以使用css选择器组合进行位置匹配。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")

#element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '[jstcache="9322"]')))
element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')))
print(element.text)
#driver.quit()

无结果案例:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()
url ="https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o"  #"https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-11-28;c:EUR;e:1;a:FR;sd:1;t:f;tt:o"
driver.get(url)

try:
    status = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'p[role=status')))
    print(status.text)
except TimeoutException as e:
    element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')))
    print(element.text)
#driver.quit()

答案 1 :(得分:1)

我可能是错的,但我认为您正在尝试获取机票价格。

如果我的假设正确,请看一下我的方法。我找到了“搜索结果”列表,然后找到“搜索结果”列表中的所有行程,进行循环并获取所有价格信息。这是我能想到的最好的方法,可以避免所有动态属性

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

wait = 20

driver = Chrome()
driver.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")

# Get the Search Result List
search_results= WebDriverWait(driver, wait).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'ol[class="gws-flights-results__result-list"]')))

# loop through all the Itinerary
for result in search_results.find_elements_by_css_selector('div[class*="gws-flights-results__collapsed-itinerary"]'):
    price = result.find_element_by_css_selector('div[class="gws-flights-results__itinerary-price"]')
    print(price.text)

输出 €18