Python中的StaleElementReferenceException

时间:2017-10-31 18:33:53

标签: python python-3.x selenium web-scraping

我正在尝试使用Selenium软件包从Sunshine List网站(http://www.sunshinelist.ca/)获取数据,但我收到以下错误。从其他几个相关的帖子我明白我需要使用WebDriverWait明确要求驱动程序等待/刷新,但我无法确定应该在何处以及如何调用该函数。

Screenshot of Error

  

StaleElementReferenceException:消息:元素引用   of(tr class =“even”)陈旧:元素不再附加到DOM或   页面已刷新

import numpy as np
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

ffx_bin = FirefoxBinary(r'C:\Users\BhagatM\AppData\Local\Mozilla Firefox\firefox.exe')
ffx_caps = DesiredCapabilities.FIREFOX
ffx_caps['marionette'] = True
driver = webdriver.Firefox(capabilities=ffx_caps,firefox_binary=ffx_bin)
driver.get("http://www.sunshinelist.ca/")
driver.maximize_window()

tablewotags1=[]

while True:
    divs = driver.find_element_by_id('datatable-disclosures')
    divs1=divs.find_elements_by_tag_name('tbody')

    for d1 in divs1:
        div2=d1.find_elements_by_tag_name('tr')
        for d2 in div2:
            tablewotags1.append(d2.text)

    try:
        driver.find_element_by_link_text('Next →').click()
    except NoSuchElementException:
        break

year1=tablewotags1[0::10]
name1=tablewotags1[3::10]
position1=tablewotags1[4::10]
employer1=tablewotags1[1::10]  


df1=pd.DataFrame({'Year':year1,'Name':name1,'Position':position1,'Employer':employer1})
df1.to_csv('Sunshine List-1.csv', index=False)

5 个答案:

答案 0 :(得分:1)

如果您的问题是点击"下一步" 按钮,您可以使用xpath执行此操作:

driver = webdriver.Firefox(executable_path=r'/pathTo/geckodriver')
driver.get("http://www.sunshinelist.ca/")
wait = WebDriverWait(driver, 20)
el=wait.until(EC.presence_of_element_located((By.XPATH,"//ul[@class='pagination']/li[@class='next']/a[@href='#' and text()='Next → ']")))
el.click()

答案 1 :(得分:1)

请尝试以下代码。 当元素不再附加到DOM并且调用StaleElementReferenceException时,再次搜索该元素以引用该元素。

请注意我使用Chrome核实:

try:
    driver.find_element_by_css_selector('div[id="datatable-disclosures_wrapper"] li[class="next"]>a').click()
except StaleElementReferenceException:
    driver.find_element_by_css_selector('div[id="datatable-disclosures_wrapper"] li[class="next"]>a').click()
except NoSuchElementException:
    break

答案 2 :(得分:0)

每次单击“下一步”按钮 - 您应找到该按钮并单击它。

或者做这样的事情:

max_attemps = 10

while True:

    next = self.driver.find_element_by_css_selector(".next>a")

    if next is not None:

        break

    else:

        time.sleep(0.5)
        max_attemps -= 1

    if max_attemps == 0:

        self.fail("Cannot find element.")

此代码后点击操作。

PS:还要尝试在fiding元素后添加time.sleep(x),然后单击操作。

答案 3 :(得分:0)

当您举起StaleElementException时,表示网站中的内容已更改,但您的列表中没有。所以诀窍是每次刷新该列表,在循环内部如下:

while True:
    driver.implicitly_wait(4)
    for d1 in driver.find_element_by_id('datatable-disclosures').find_element_by_tag_name('tbody').find_elements_by_tag_name('tr'):
            tablewotags1.append(d1.text)

    try:
        driver.switch_to.default_content()
        driver.find_element_by_xpath('//*[@id="datatable-disclosures_wrapper"]/div[2]/div[2]/div/ul/li[7]/a').click()
    except NoSuchElementException:
        print('Don\'t be so cryptic about error messages, they are good\n
              ...Script broke clicking next') #jk aside put some info there
        break

希望这能帮到你,欢呼。

编辑: 所以我去了上述网站,布局很简单,但结构重复了四次。因此,当您开始爬网站时,必然会发生变化。

所以我编辑了代码,只废弃了一个tbody树。这棵树来自第一个datatable-disclousure。并添加了一些等待。

答案 4 :(得分:0)

>>>Stale Exceptions can be handled using **StaleElementReferenceException** to continue to execute the for loop. When you try to get the element by any find_element method in a for loop.    

from selenium.common import exceptions  

and customize your code of for loop as:  

for loop starts:  
   try:  
        driver.find_elements_by_id("data")  //method to find element
        //your code 
   except exceptions.StaleElementReferenceException:  
        pass