循环遍历页面并在python Selenium中获取StaleElementReferenceException

时间:2017-02-27 01:56:43

标签: python selenium selenium-webdriver web-scraping

所以我正在遍历一堆网页。目前,所有网页都具有相同的结构,包括后退按钮和前进按钮(//span/a)[2]。出于某种原因,我可以遍历第一页(有时是第二页)。但是我继续得到StaleElementReferenceException

以下是相关代码:

for x in range(0,5):
    print 'page %d' %(x)
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.XPATH, "(//span/a)[2]"))
    )
    listItems = driver.find_elements_by_xpath("//td[@class='CourseCode']/a")
    for element in listItems:
        elementText = element.text
        print(elementText)
        writeFile.write(element.text + '\n')
    driver.find_element_by_xpath("(//span/a)[2]").click()

特别是这里是堆栈跟踪:

Traceback (most recent call last):
File "getList.py", line 21, in lookup
addListItems(driver, courseCodeFile)
File "getList.py", line 44, in addListItems
elementText = element.text
File "/home/francisco/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 73, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "/home/francisco/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 494, in _execute
return self._parent.execute(command, params)
File "/home/francisco/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/home/francisco/.local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
StaleElementReferenceException: Message: The element reference is stale. Either the element is no longer attached to the DOM or the page has been refreshed.

我尝试了一堆无济于事。奇怪的是,如果没有循环,我能够使该功能正常工作两页。

在RTE之前,它将打印在前一页中获得的listItems的前2-3个元素的文本。

1 个答案:

答案 0 :(得分:1)

您可以在StaleElementReferenceExpection中使用stalenessOf Expected Condition来避免WebDriverWait

StaleElementReferenceExpection出现在两种常见情况中:

  1. 该元素已被完全删除。
  2. 该元素不再附加到DOM。
  3. 当您在所有网页中使用常用定位器时,单击某个元素后,selenium仍会引用上一页中的定位器(DOM is NOT yet updated, still referencing to the old web page)

    一个简单的解决方案是在代码末尾添加time.sleep,以便DOM得到更新,定位器将应用于新网页的DOM。

    for x in range(0,5):
        print 'page %d' %(x)
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, "(//span/a)[2]"))
        )
        listItems = driver.find_elements_by_xpath("//td[@class='CourseCode']/a")
        for element in listItems:
            elementText = element.text
            print(elementText)
            writeFile.write(element.text + '\n')
        driver.find_element_by_xpath("(//span/a)[2]").click()
        import time
        time.sleep(0.5) //0.5 seconds
    

    另一种解决方案是检查每个网页中的唯一元素,这可能在for loop中无法实现(如果您使用if-else& indexing等,则可能。)