从下拉菜单中逐页阅读页面 - 无法在第2页

时间:2015-07-28 13:52:26

标签: javascript python selenium selenium-webdriver web-scraping

我有page

我想从页面顶部开始与下拉菜单中的元素相关联的每个页面(以获取URL)。

新的硒,我正在尝试一些初步工作:

  • 打开驱动程序
  • 将其发布到网页
  • 选择下拉菜单
  • 只需选择一个随机的"名称"来自任意值= 2
  • 登录页面并从中获取网址。打印出来。
  • 只需选择一个随机的"名称"从任意值= 3 错误。

我使用的代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import time

driver = webdriver.Firefox()
driver.get("http://www.hillsproducts.com/General.aspx/en-GB/PD/a-d-canine/original/can")
select = Select(driver.find_element_by_xpath("//select[@id='productSpecifier_product']"))
value="2"
select.select_by_value(value)
print(driver.current_url)
time.sleep(10)
value="3"
select.select_by_value(value)
print(driver.current_url)

有一些我无法得到的东西。 我得到的错误如下:

  

Traceback(最近一次调用最后一次):文件   " /Users/Luigi/Desktop/selenium_attempt.py" ;,第19行,在       select.select_by_value(value)File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/支撑/ select.py&#34 ;,   第76行,在select_by_value中       opts = self._el.find_elements(By.CSS_SELECTOR,css)File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3。 4.egg /硒/ webdriver的/远程/ webelement.py&#34 ;,   第44行,在find_elements中       {"使用":by," value":value})[' value'] File" /Library/Frameworks/Python.framework/Versions/ 3.4 / LIB / python3.4 /站点包/硒-2.46.1-py3.4.egg /硒/ webdriver的/远程/ webelement.py&#34 ;,   第447行,在_execute中       return self._parent.execute(command,params)File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg /selenium/webdriver/remote/webdriver.py" ;,   第193行,执行中       self.error_handler.check_response(response)File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/的webdriver /远程/ errorhandler.py&#34 ;,   第181行,在check_response中       raise exception_class(message,screen,stacktrace)selenium.common.exceptions.StaleElementReferenceException:Message:   在缓存中找不到元素 - 也许页面已经改变了   被抬头看Stacktrace:       在fxdriver.cache.getElementAt(资源://fxdriver/modules/web-element-cache.js:9348)       在Utils.getElementAt(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver@googlecode.com/components/driver-component.js:8942)       在FirefoxDriver.prototype.findElementsInternal_(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver@googlecode.com/components/driver-component.js:10685)       在FirefoxDriver.prototype.findChildElements(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver@googlecode.com/components/driver-component.js:10706)       在DelayedCommand.prototype.executeInternal_ / h(文件:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver@googlecode.com/components/command-processor.js:12643)       在DelayedCommand.prototype.executeInternal_(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver@googlecode.com/components/command-processor.js:12648)       在DelayedCommand.prototype.execute /< (文件:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver@googlecode.com/components/command-processor.js:12590)

任何想法都将不胜感激!

Alex回答后更新:

  

Traceback(最近一次调用最后一次):文件   " /Users/Luigi/Desktop/selenium_attempt.py" ;,第18行,在       if index> = len(select.options):File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4。蛋/硒/ webdriver的/支持/ select.py&#34 ;,   第46行,在选项中       返回self._el.find_elements(By.TAG_NAME,'选项')文件" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium- 2.46.1-py3.4.egg /硒/ webdriver的/远程/ webelement.py&#34 ;,   第44行,在find_elements中       {"使用":by," value":value})[' value'] File" /Library/Frameworks/Python.framework/Versions/ 3.4 / LIB / python3.4 /站点包/硒-2.46.1-py3.4.egg /硒/ webdriver的/远程/ webelement.py&#34 ;,   第447行,在_execute中       return self._parent.execute(command,params)File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg /selenium/webdriver/remote/webdriver.py" ;,   第193行,执行中       self.error_handler.check_response(response)File" /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/的webdriver /远程/ errorhandler.py&#34 ;,   第181行,在check_response中       raise exception_class(message,screen,stacktrace)selenium.common.exceptions.StaleElementReferenceException:Message:   在缓存中找不到元素 - 也许页面已经改变了   被抬头看Stacktrace:       在fxdriver.cache.getElementAt(资源://fxdriver/modules/web-element-cache.js:9348)       在Utils.getElementAt(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver@googlecode.com/components/driver-component.js:8942)       在FirefoxDriver.prototype.findElementsInternal_(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver@googlecode.com/components/driver-component.js:10685)       在FirefoxDriver.prototype.findChildElements(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver@googlecode.com/components/driver-component.js:10706)       在DelayedCommand.prototype.executeInternal_ / h(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver@googlecode.com/components/command-processor.js:12643)       在DelayedCommand.prototype.executeInternal_(file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver@googlecode.com/components/command-processor.js:12648)       在DelayedCommand.prototype.execute /< (文件:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver@googlecode.com/components/command-processor.js:12590)

1 个答案:

答案 0 :(得分:1)

每次加载新页面时都必须重新验证Select()

from selenium import webdriver
from selenium.webdriver.support.ui import Select


driver = webdriver.Firefox()
driver.get("http://www.hillsproducts.com/General.aspx/en-GB/PD/a-d-canine/original/can")

index = 0
while True:
    select = Select(driver.find_element_by_id("productSpecifier_product"))

    # exit the loop if all the options were seen
    if index >= len(select.options):
        break

    select.select_by_index(index)
    print(driver.current_url)

    index += 1