Selenium错误:在缓存中找不到元素 - 可能自页面查找以来页面已更改

时间:2015-06-12 11:22:44

标签: python selenium web-scraping

我在网址的每个页面上提取第一个“名称”字段:“http://www.srlworld.com/content/65/find-a-lab.html

for循环运行一次并抛出错误:

File "srl.py", line 40, in <module>
    print state.text
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 66, in text
    return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 404, in _execute
    return self._parent.execute(command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 195, in execute
    self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 170, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: Element not found in the cache - perhaps the page has changed since it was looked up
Stacktrace:
    at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8981)
    at Utils.getElementAt (file:///tmp/tmpPEHToH/extensions/fxdriver@googlecode.com/components/command-processor.js:8574)
    at WebElement.getElementText (file:///tmp/tmpPEHToH/extensions/fxdriver@googlecode.com/components/command-processor.js:11722)
    at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpPEHToH/extensions/fxdriver@googlecode.com/components/command-processor.js:12282)
    at fxdriver.Timer.prototype.setTimeout/<.notify (file:///tmp/tmpPEHToH/extensions/fxdriver@googlecode.com/components/command-processor.js:603)

代码是:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select

driver = webdriver.Firefox()
driver.get("http://www.srlworld.com/content/65/find-a-lab.html")
#assert "http" in driver.title

elem = driver.find_element_by_id("country")
#driver.implicitly_wait(5)
all_countries = elem.find_elements_by_tag_name("option")
country = all_countries[1]
print "country value is %s" % country.get_attribute("value")
country.click()
driver.implicitly_wait(2)

state_elem = driver.find_element_by_id("state")
all_states = state_elem.find_elements_by_tag_name("option")
del all_states[0]


for state in all_states:
    print "start ",
    print state.text

    print "state value is %s" % state.get_attribute("value")
    state.click()
    driver.implicitly_wait(2)

    driver.find_element_by_name("go").click()

    name = driver.find_element_by_xpath("//div[span='Name'][1]/span/following-sibling::span[2]")
    print name.text
    print "end ",
    print state.text

在运行此脚本时,只运行一次的for循环不会打印最后一个'state.text',即使我没有进行任何更改。

1 个答案:

答案 0 :(得分:1)

考虑到异常的文本,会发生以下情况:每次按下“Go”按钮时,页面都会刷新(加载新数据,而不是通过AJAX,但通过实际刷新 - 这很重要),因此Selenium会检测到页面状态更改并在您尝试从其先前状态访问元素时引发异常。我建议使用以下算法来解决您的问题:

current_position = 1

while True:
    try:
        state_elem = driver.find_element_by_id("state")
        all_states = state_elem.find_elements_by_tag_name("option")
        state = all_states[current_position]
        print "start ",
        print state.text

        print "state value is %s" % state.get_attribute("value")
        state.click()
        driver.implicitly_wait(2)

        driver.find_element_by_name("go").click()

        name = driver.find_element_by_xpath("//div[span='Name'][1]/span/following-sibling::span[2]")
        print name.text
        print "end ",
        print state.text
        current_position += 1
    except:
        break

这样,您每次都会在新生成的页面上选择下一个选项,并且您不应该获得之前的例外。