Question

我正在尝试使用Selenium软件包从Sunshine List网站（http://www.sunshinelist.ca/）获取数据，但我收到以下错误。从其他几个相关的帖子我明白我需要使用WebDriverWait明确要求驱动程序等待/刷新，但我无法确定应该在何处以及如何调用该函数。

Screenshot of Error

StaleElementReferenceException：消息：元素引用 of（tr class =“even”）陈旧：元素不再附加到DOM或页面已刷新

import numpy as np
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

ffx_bin = FirefoxBinary(r'C:\Users\BhagatM\AppData\Local\Mozilla Firefox\firefox.exe')
ffx_caps = DesiredCapabilities.FIREFOX
ffx_caps['marionette'] = True
driver = webdriver.Firefox(capabilities=ffx_caps,firefox_binary=ffx_bin)
driver.get("http://www.sunshinelist.ca/")
driver.maximize_window()

tablewotags1=[]

while True:
    divs = driver.find_element_by_id('datatable-disclosures')
    divs1=divs.find_elements_by_tag_name('tbody')

    for d1 in divs1:
        div2=d1.find_elements_by_tag_name('tr')
        for d2 in div2:
            tablewotags1.append(d2.text)

    try:
        driver.find_element_by_link_text('Next →').click()
    except NoSuchElementException:
        break

year1=tablewotags1[0::10]
name1=tablewotags1[3::10]
position1=tablewotags1[4::10]
employer1=tablewotags1[1::10]  


df1=pd.DataFrame({'Year':year1,'Name':name1,'Position':position1,'Employer':employer1})
df1.to_csv('Sunshine List-1.csv', index=False)

Answer 1

如果您的问题是点击＆＃34;下一步＆＃34; 按钮，您可以使用xpath执行此操作：

driver = webdriver.Firefox(executable_path=r'/pathTo/geckodriver')
driver.get("http://www.sunshinelist.ca/")
wait = WebDriverWait(driver, 20)
el=wait.until(EC.presence_of_element_located((By.XPATH,"//ul[@class='pagination']/li[@class='next']/a[@href='#' and text()='Next → ']")))
el.click()

Answer 2

请尝试以下代码。当元素不再附加到DOM并且调用StaleElementReferenceException时，再次搜索该元素以引用该元素。

请注意我使用Chrome核实：

try:
    driver.find_element_by_css_selector('div[id="datatable-disclosures_wrapper"] li[class="next"]>a').click()
except StaleElementReferenceException:
    driver.find_element_by_css_selector('div[id="datatable-disclosures_wrapper"] li[class="next"]>a').click()
except NoSuchElementException:
    break

Answer 3

每次单击“下一步”按钮 - 您应找到该按钮并单击它。

或者做这样的事情：

max_attemps = 10

while True:

    next = self.driver.find_element_by_css_selector(".next>a")

    if next is not None:

        break

    else:

        time.sleep(0.5)
        max_attemps -= 1

    if max_attemps == 0:

        self.fail("Cannot find element.")

此代码后点击操作。

PS：还要尝试在fiding元素后添加time.sleep(x)，然后单击操作。

Answer 4

当您举起StaleElementException时，表示网站中的内容已更改，但您的列表中没有。所以诀窍是每次刷新该列表，在循环内部如下：

while True:
    driver.implicitly_wait(4)
    for d1 in driver.find_element_by_id('datatable-disclosures').find_element_by_tag_name('tbody').find_elements_by_tag_name('tr'):
            tablewotags1.append(d1.text)

    try:
        driver.switch_to.default_content()
        driver.find_element_by_xpath('//*[@id="datatable-disclosures_wrapper"]/div[2]/div[2]/div/ul/li[7]/a').click()
    except NoSuchElementException:
        print('Don\'t be so cryptic about error messages, they are good\n
              ...Script broke clicking next') #jk aside put some info there
        break

希望这能帮到你，欢呼。

编辑：所以我去了上述网站，布局很简单，但结构重复了四次。因此，当您开始爬网站时，必然会发生变化。

所以我编辑了代码，只废弃了一个tbody树。这棵树来自第一个datatable-disclousure。并添加了一些等待。

Answer 5

>>>Stale Exceptions can be handled using **StaleElementReferenceException** to continue to execute the for loop. When you try to get the element by any find_element method in a for loop.    

from selenium.common import exceptions  

and customize your code of for loop as:  

for loop starts:  
   try:  
        driver.find_elements_by_id("data")  //method to find element
        //your code 
   except exceptions.StaleElementReferenceException:  
        pass

Python中的StaleElementReferenceException

5 个答案: