使用Selenium和Python从下拉菜单中选择

时间:2018-07-21 09:54:13

标签: python selenium web-crawler

借助 Selenium Python 。我想抓取一个具有嵌套下拉菜单的网页。我只在下面发布嵌套部分:

<div class="dropDown active" data-dropdown-block="FOOTBALL_COMPSEASON" data-dropdown-default="All Seasons">
    <div class="label" id="dd-FOOTBALL_COMPSEASON">Filter by Season</div> 
    <div class="current" data-dropdown-current="FOOTBALL_COMPSEASON" role="button" tabindex="0" aria-expanded="false" aria-labelledby="dd-FOOTBALL_COMPSEASON" data-listen-keypress="true" data-listen-click="true">
        2018/19
    </div>
    <ul class="dropdownList" data-dropdown-list="FOOTBALL_COMPSEASON" role="listbox" aria-labelledby="dd-FOOTBALL_COMPSEASON" data-listen-keypress="true" data-listen-click="true">
        <li role="option" tabindex="0" data-option-name="All Seasons" data-option-id="-1" data-option-index="-1">
             All Seasons
        </li> 
        <li role="option" tabindex="0" data-option-name="2018/19" data-option-id="210" data-option-index="0">
            2018/19
         </li>
         <li role="option" tabindex="0" data-option-name="2017/18" data-option-id="79" data-option-index="1">
              2017/18
         </li>
         <li role="option" tabindex="0" data-option-name="2016/17" data-option-id="54" data-option-index="2">
             2016/17
         </li>
    </ul>
</div>

以下是其外观的屏幕截图:

因此,我想让搜寻器单击下拉列表并选择2017/18。

我首先尝试过:

driver.get(_url)
select_element = driver.find_elements_by_class_name("dropdownList")[1]

由于类dropdownList在HTML中被多次使用,而我所需的元素位于第二位置,即<ul class="dropdownList"....是第二次使用类dropdown,所以我用[1]来生第二个孩子。

但随后出现此错误:

  

shots_2017_18中第15行的文件“ shots_2017_18.py”       select_element = driver.find_elements_by_class_name(“ dropdownList”)1 IndexError:列表   索引超出范围

我应该更改或执行哪些操作,以便爬网程序可以从下拉列表中选择2017/18项目并进行爬网?

2 个答案:

答案 0 :(得分:1)

如果您可以使用 python 单击下拉菜单。然后,您可以尝试以下代码:

更新:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.common.action_chains import ActionChains
import time


driver   = webdriver.Chrome(executable_path = r'C:/Users/user***/Downloads/chromedriver_win32/chromedriver.exe')
driver.maximize_window()

wait = WebDriverWait(driver,40)

driver.get("https://www.premierleague.com/stats/top/players/goals")  

wait.until(EC.visibility_of_element_located((By.ID, 'dd-FOOTBALL_COMPSEASON')))

time.sleep(5)
drop_down_click = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.current[data-dropdown-current='FOOTBALL_COMPSEASON']")))
drop_down_click.click()

options = driver.find_elements_by_css_selector("ul[data-dropdown-list='FOOTBALL_COMPSEASON'] li")

for option in options:
  if "2017/18" in option.text.strip():
    option.click()  

UPDATE1:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.common.action_chains import ActionChains
import time

driver   = webdriver.Chrome(executable_path = r'C:/Users/user***/Downloads/chromedriver_win32/chromedriver.exe')
driver.maximize_window()

wait = WebDriverWait(driver,40)

driver.get("https://www.premierleague.com/stats/top/players/total_scoring_att")


cookie_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.btn-primary.cookies-notice-accept")))
ActionChains(driver).move_to_element(cookie_button)
driver.execute_script('arguments[0].click();', cookie_button)
wait.until(EC.visibility_of_element_located((By.ID, 'dd-FOOTBALL_COMPSEASON')))

time.sleep(5)
drop_down_click = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.current[data-dropdown-current='FOOTBALL_COMPSEASON']")))
drop_down_click.click()

options = driver.find_elements_by_css_selector("ul[data-dropdown-list='FOOTBALL_COMPSEASON'] li")

for option in options:
  if "2017/18" in option.text.strip():
    option.click()  

说明

显式等待是您定义的代码,用于在继续执行代码之前等待特定条件发生。最糟糕的情况是Thread.sleep(),它将条件设置为要等待的确切时间段。提供了一些方便的方法,可以帮助您编写仅等待所需时间的代码。将WebDriverWait与ExpectedCondition结合是实现此目的的一种方法。

有关显式等待的更多信息,请参见here

答案 1 :(得分:0)

在这种情况下,当索引超出范围时,这意味着找到的元素为“无”或“仅一个”,因为您编写的代码正确,我认为您输入的URL错误。 但是如果URL正确,则可以使用XPATH查找适当的元素。 尝试以下代码:

select_element = driver.find_element_by_xpath("//li[@data-option-name='2017/18']")