Question

我正在使用Python Selenium从页面中提取所有链接，我选择了Selenium，因为我所针对的许多网站都使用JS来呈现DOM。但是，Selenium在遍历页面<a>元素时抛出了Stale Exception错误。

import from selenium.webdriver import Firefox

def request(url):
     urls = []
     browser = Firefox()
     browser.implicitly_wait(7)
     browser.get(url)

     elements = browser.find_elements_by_xpath("//a")
     for link in elements:
            href = link.get_attribute('href')
            urls.append(href)

     browser.close()
     return urls

这是selenium抛出的错误

File "/var/extractor/main.py", line 80, in request
    href = link.get_attribute('href')
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webelement.py", line 113, in get_attribute
    resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webelement.py", line 469, in _execute
    return self._parent.execute(command, params)
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 201, in execute
    self.error_handler.check_response(response)
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: Element not found in the cache - perhaps the page has changed since it was looked up

因此，为了尝试解决此问题，我将get_attribute('href')包装在try异常中。

     for link in elements:
            try:
                href = link.get_attribute('href')
            except StaleElementReferenceException:
                continue
            urls.append(href)

但这也不起作用！应用程序永远不会退出for循环，它只是永远挂起。所以我不太清楚该怎么做。我已经看过很多关于陈旧元素异常的帖子，但没有一个永远挂起。我觉得整个StaleElement异常是selenium中的设计缺陷，它应该能够更优雅地失败。非常感谢有关如何解决此问题的任何建议。

[编辑]：在我解决这个问题之前，我发送page_source到lxml来解析链接而不是Selenium。这确实解决了这个问题。但我宁愿不引入另一个依赖项并修复Selenium的问题。

Python Selenium陈旧元素永远挂起

0 个答案: