我正在尝试从此网站获取pdf文件。我正在尝试创建一个双循环,以便我可以滚动多年(季节)以获得每年的所有主要pdf。
这行代码不起作用。问题是,我不能让这条线工作(多年来应该循环的那个(季节):
for year in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#season a aria-valuetext"))):
year.click()
这是完整的代码:
os.chdir("C:..")
driver = webdriver.Chrome("chromedriver.exe")
wait = WebDriverWait(driver, 10)
driver.get("http://www.motogp.com/en/Results+Statistics/")
links = []
for year in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#season a aria-valuetext"))):
year.click()
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))):
item.click()
elem = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "padleft5")))
print(elem.get_attribute("href"))
links.append(elem.get_attribute("href"))
wait.until(EC.staleness_of(elem))
driver.quit()
这是上一篇文章,我在上面的代码中获得了帮助:
答案 0 :(得分:2)
以下解决方案应该适合您。首先,我们在CSS滑块中迭代#years。然后我们使用您的代码示例来处理列表。添加了sleep命令,因为我一直在超时。
<强> CODE 强>
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome("chromedriver.exe")
wait = WebDriverWait(driver, 10)
driver.get("http://www.motogp.com/en/Results+Statistics/")
slider = driver.find_element_by_xpath('//*[@id="handle_season"]')
for year in range(68):
wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="event"]')))
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))):
item.click()
elem = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "padleft5")))
print(elem.get_attribute("href"))
wait.until(EC.staleness_of(elem))
slider.send_keys(Keys.ARROW_LEFT)
time.sleep(1)
driver.quit()
结果:
答案 1 :(得分:1)
如果您在防火墙后工作,那么您的EC很多时候都无法正常工作。看看time.sleep(10)函数是否没有让你通过它,而不是EC。其次,在运行EC之前检查page_source ...如果你在防火墙后面,HTML源代码会告诉你。