我已经在这工作了几个小时,但没有取得任何进展。我正在尝试点击此页面上的下一个按钮 here
这是我的代码:
#!/usr/local/bin python3
import sys
import time
import re
import logging
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as options
from bs4 import BeautifulSoup as bs
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
_USE_VIRTUAL_DISPLAY = False
_FORMAT = '%(asctime)s - %(levelname)s - %(name)s - %(message)s'
# logging.basicConfig(filename=LOG_FILENAME,level=logging.DEBUG)
logging.basicConfig(format=_FORMAT, level=logging.INFO)
_LOGGER = logging.getLogger(sys.argv[0])
_DEFAULT_SLEEP = 0.5
try:
options = options()
# options.headless = True
driver = webdriver.Firefox(options=options, executable_path=r"/usr/local/bin/geckodriver")
print("Started Browser and Driver")
except:
_LOGGER.info("Can not run headless mode.")
url = 'https://www.govinfo.gov/app/collection/uscourts/district/alsd/2021/%7B%22pageSize%22%3A%22100%22%2C%22offset%22%3A%220%22%7D'
driver.get(url)
time.sleep(5)
page = driver.page_source
soup = bs(page, "html.parser")
next_page = WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.XPATH,'//*[@id="collapseOne1690"]/div/span[1]/div/ul/li[8]/a')))
if next_page:
print('*****getting next page*****')
# driver.execute_script('arguments[0].click()', next_page)
next_page.click()
time.sleep(3)
else:
print('no next page')
driver.quit()
我收到超时错误。我试过更改 XPath。我试过 ActionChains 滚动到视图中,但没有任何效果。任何帮助表示赞赏。
答案 0 :(得分:2)
1 您的 XPATH 不起作用,因为它使用动态类名 collapseOne1690
,如前所述。
此外,即使您使用了此类名称的一部分,它也不是很稳定。
如果您更喜欢 XPath,我建议您使用这个://span[@class='custom-paginator']//li[@class='next fw-pagination-btn']/a
或只是 //li[@class='next fw-pagination-btn']/a
。您还可以使用 css 选择器:.next.fw-pagination-btn
2 我去掉了日志代码,因为它也有一些问题,重新检查一下。
3 5 秒显式等待太小。至少 10 秒,最好是 15 秒。这只是一个建议。
点击按钮并使用 Firefox 的最小可重现代码是:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as options
from bs4 import BeautifulSoup as bs
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
options = options()
# options.headless = True
driver = webdriver.Firefox(options=options)
print("Started Browser and Driver")
url = 'https://www.govinfo.gov/app/collection/uscourts/district/alsd/2021/%7B%22pageSize%22%3A%22100%22%2C%22offset%22%3A%220%22%7D'
driver.get(url)
page = driver.page_source
soup = bs(page, "html.parser")
print(soup)
next_page = WebDriverWait(driver, 15).until(
EC.element_to_be_clickable((By.XPATH, "//span[@class='custom-paginator']//li[@class='next fw-pagination-btn']/a")))
next_page.click()
# driver.quit()
答案 1 :(得分:0)
当我加载这个页面时,div id 是动态分配的。第一次加载页面,id是collapseOne5168
,第二次是collapseOne1136
您可能会考虑改用 find_element_by_class_name("next fw-pagination-btn")
?