Question

soup = BeautifulSoup(browser.page_source, "html.parser")
for h1 in soup.find_all('h2'):
    try:
        array.append("https://www.chamberofcommerce.com" + h1.find("a")['href'])
        print("https://www.chamberofcommerce.com" + h1.find("a")['href'])
    except:
        pass

input=browser.find_element_by_xpath('//a[@class="next"]')
while input:
    input.click()
    time.sleep(10)
    soup = BeautifulSoup(browser.page_source, "html.parser")

    for h1 in soup.find_all('h2'):
        try:
            array.append("https://www.chamberofcommerce.com" + h1.find("a")['href'])
            print("https://www.chamberofcommerce.com" + h1.find("a")['href'])
        except:
            pass

这部分代码会在yellopages上刮掉列表中的网址，直到我以前只从搜索的第一页中删除url之前，该代码才能正常工作，现在我希望它单击next按钮，直到搜索页面完成，示例例如，如果搜索到20页，则selenuim机器人应单击“下一步”按钮，并抓取网址，直到到达第20页为止。

请查看代码的逻辑，并且在bot到达第2页（由于实际的页面数为15并在第2页崩溃）后，我也收到以下错误消息：

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

Answer 1

while input并不是您所需要的...请注意，一旦单击“下一步”按钮，将加载新页面，并且上一页中的所有WebElement都不再有效：您必须在每个页面上重新定义它们。请尝试以下方法：

while True:
    try:
        browser.find_element_by_xpath('//a[@class="next"]').click()
    except:
        break

使用以上代码，您应该可以在可用的每一页上单击“下一步”按钮。您可能还需要应用ExplicitWait来等待“下一步”按钮被点击：

wait.until(EC.element_to_be_clickable((By.XPATH, '//a[@class="next"]'))).click()

Answer 2

使用显式等待

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...

t = 10 # Timeout

try:
    element = WebDriverWait(driver, t).until(
        EC.element_to_be_clickable((By.XPATH, "//a[@class='next']"))
    )
except:
    # handle element not found or unclickable

element.click()

...

Selenuim Python问题，使元素循环

2 个答案: