Question

我正在写我的第一个真正的刮板，尽管总体来说进展顺利，但我已经使用Selenium碰壁了。我无法转到下一页。

下面是我的代码的开头。下面的输出仅是现在在终端中打印数据，并且一切正常。它仅在第1页末尾停止抓取，并向我显示我的终端提示。它永远不会从第2页开始。如果有人可以提出建议，我将非常感激。我试图选择页面底部的按钮，我试图同时使用相对和完整的Xpath（您在此处看到完整的Xpath）来抓取，但均无效。我正在尝试单击右箭头按钮。

我内置了自己的错误消息，以指示驱动程序是否通过Xpath成功找到了元素。当我执行代码时，错误消息就会触发，因此我猜没有找到该元素。我就是不明白为什么不这么做。

# Importing libraries
import requests
import csv
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

# Import selenium 
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/path/to/driver", options=options)
# Yes, I do have the actual path to my driver in the original code

driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
time.sleep(5)
while True:
    try:
        driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div/form/div[3]/div/div/ul[1]/li[4]/a').click()
    except (TimeoutException, WebDriverException) as e:
        print("A timeout or webdriver exception occurred.")
        break
driver.quit()

Answer 1

您可以做的是设置Selenium True（expected conditions，visibility_of_element_located）并使用相对的XPath选择下一页元素。所有这些都是循环的（范围是您必须处理的页面数）。

下一个页面链接的XPath：

element_to_be_clickable

代码可能类似于：

//div[@class='pagination ctm-pagination']/ul[1]/li[last()-1]/a

Answer 2

您非常了解while True和try-catch{}的逻辑。要使用Selenium和python转到下一页，您必须为element_to_be_clickable()引出WebDriverWait，并且可以使用以下任一Locator Strategies：

代码块：

driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
while True:
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'state-active')]//following::li[1]/a[@href]"))).click()
        print("Clicked for next page")
        WebDriverWait(driver, 10).until(EC.staleness_of(driver.find_element_by_xpath("//a[contains(@class, 'state-active')]//following::li[1]/a[@href]")))
    except (TimeoutException):
        print("No more pages")
        break
driver.quit()

控制台输出：
```
Clicked for next page
No more pages
```

硒不会进入刮板的下一页

2 个答案: