借助 Selenium 和 Python 。我想抓取一个具有嵌套下拉菜单的网页。我只在下面发布嵌套部分:
<div class="dropDown active" data-dropdown-block="FOOTBALL_COMPSEASON" data-dropdown-default="All Seasons">
<div class="label" id="dd-FOOTBALL_COMPSEASON">Filter by Season</div>
<div class="current" data-dropdown-current="FOOTBALL_COMPSEASON" role="button" tabindex="0" aria-expanded="false" aria-labelledby="dd-FOOTBALL_COMPSEASON" data-listen-keypress="true" data-listen-click="true">
2018/19
</div>
<ul class="dropdownList" data-dropdown-list="FOOTBALL_COMPSEASON" role="listbox" aria-labelledby="dd-FOOTBALL_COMPSEASON" data-listen-keypress="true" data-listen-click="true">
<li role="option" tabindex="0" data-option-name="All Seasons" data-option-id="-1" data-option-index="-1">
All Seasons
</li>
<li role="option" tabindex="0" data-option-name="2018/19" data-option-id="210" data-option-index="0">
2018/19
</li>
<li role="option" tabindex="0" data-option-name="2017/18" data-option-id="79" data-option-index="1">
2017/18
</li>
<li role="option" tabindex="0" data-option-name="2016/17" data-option-id="54" data-option-index="2">
2016/17
</li>
</ul>
</div>
以下是其外观的屏幕截图:
因此,我想让搜寻器单击下拉列表并选择2017/18。
我首先尝试过:
driver.get(_url)
select_element = driver.find_elements_by_class_name("dropdownList")[1]
由于类dropdownList
在HTML中被多次使用,而我所需的元素位于第二位置,即<ul class="dropdownList"....
是第二次使用类dropdown
,所以我用[1]
来生第二个孩子。
但随后出现此错误:
shots_2017_18中第15行的文件“ shots_2017_18.py” select_element = driver.find_elements_by_class_name(“ dropdownList”)1 IndexError:列表 索引超出范围
我应该更改或执行哪些操作,以便爬网程序可以从下拉列表中选择2017/18项目并进行爬网?
答案 0 :(得分:1)
如果您可以使用 python 和硒单击下拉菜单。然后,您可以尝试以下代码:
更新:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path = r'C:/Users/user***/Downloads/chromedriver_win32/chromedriver.exe')
driver.maximize_window()
wait = WebDriverWait(driver,40)
driver.get("https://www.premierleague.com/stats/top/players/goals")
wait.until(EC.visibility_of_element_located((By.ID, 'dd-FOOTBALL_COMPSEASON')))
time.sleep(5)
drop_down_click = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.current[data-dropdown-current='FOOTBALL_COMPSEASON']")))
drop_down_click.click()
options = driver.find_elements_by_css_selector("ul[data-dropdown-list='FOOTBALL_COMPSEASON'] li")
for option in options:
if "2017/18" in option.text.strip():
option.click()
UPDATE1:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path = r'C:/Users/user***/Downloads/chromedriver_win32/chromedriver.exe')
driver.maximize_window()
wait = WebDriverWait(driver,40)
driver.get("https://www.premierleague.com/stats/top/players/total_scoring_att")
cookie_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.btn-primary.cookies-notice-accept")))
ActionChains(driver).move_to_element(cookie_button)
driver.execute_script('arguments[0].click();', cookie_button)
wait.until(EC.visibility_of_element_located((By.ID, 'dd-FOOTBALL_COMPSEASON')))
time.sleep(5)
drop_down_click = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.current[data-dropdown-current='FOOTBALL_COMPSEASON']")))
drop_down_click.click()
options = driver.find_elements_by_css_selector("ul[data-dropdown-list='FOOTBALL_COMPSEASON'] li")
for option in options:
if "2017/18" in option.text.strip():
option.click()
说明:
显式等待是您定义的代码,用于在继续执行代码之前等待特定条件发生。最糟糕的情况是Thread.sleep(),它将条件设置为要等待的确切时间段。提供了一些方便的方法,可以帮助您编写仅等待所需时间的代码。将WebDriverWait与ExpectedCondition结合是实现此目的的一种方法。
有关显式等待的更多信息,请参见here
答案 1 :(得分:0)
在这种情况下,当索引超出范围时,这意味着找到的元素为“无”或“仅一个”,因为您编写的代码正确,我认为您输入的URL错误。 但是如果URL正确,则可以使用XPATH查找适当的元素。 尝试以下代码:
select_element = driver.find_element_by_xpath("//li[@data-option-name='2017/18']")