来自页面选项列表的python selenium site spider

时间:2013-09-18 03:18:26

标签: python selenium selenium-webdriver option html-select

我正在使用Python2.7Selenium,并尝试使用选择框的选项列表作为我的网站蜘蛛功能的基础,让我们直接使用代码:

 select = self.br.find_element_by_name( field )  #get the select element            
 options = select.find_elements_by_tag_name("option") #get all the options into a list

 for option in options: #iterate over the options
     print "starting loop on option %s" % option.text

     #now get the option with the value that is currently being iterated over and select it from the original select box source
     self.br.find_element_by_xpath("//select[@name='%s']/option[@value='%s']" % ( field, option.get_attribute("value") ) ).click() #the click takes you to a new page

     source = self.br.page_source #get the new page source

     #now check to see if some required data is on the navigated page, and print some stuff if so
     if "There is no summary data available." not in source:
          print "the new page is good! Here are the original args: ", option.text, option.get_attribute("value") 
     #time to go back to the main page and click the next option element
     self.br.back()
     print "went backwards" #for debugging

因此,一切都有效,直到self.br.back()之后的第二次迭代,并且循环再次开始。我得到一个非常长的Selenium错误说明:

 selenium.common.exceptions.StaleElementReferenceException: Message: u'Element not found in the cache - perhaps the page has changed since it was looked up' ; Stacktrace: 
at fxdriver.cache.getElementAt (resource://fxdriver/modules/web_element_cache.js:7643)
at Utils.getElementAt (file:///tmp/tmpm_ciQJ/extensions/fxdriver@googlecode.com/components/command_processor.js:7232)
at WebElement.getElementAttribute (file:///tmp/tmpm_ciQJ/extensions/fxdriver@googlecode.com/components/command_processor.js:10335)
at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpm_ciQJ/extensions/fxdriver@googlecode.com/components/command_processor.js:10840)
at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmpm_ciQJ/extensions/fxdriver@googlecode.com/components/command_processor.js:10845)
at DelayedCommand.prototype.execute/< (file:///tmp/tmpm_ciQJ/extensions/fxdriver@googlecode.com/components/command_processor.js:10787) 

显然错误说该元素可能不再存在,但由于我只是迭代在上一页会话期间检索到的对象列表,因此可能如何实现...

无论如何,我该怎么做呢?也许我正在尝试的方式不是最好的方式......

1 个答案:

答案 0 :(得分:1)

我对python并不完全熟悉,所以你可能需要稍微重做一下。我认为这至少可以让你开始。

from selenium.webdriver.support.ui import Select, WebDriverWait

select = self.br.find_element_by_name( field )  #get the select element            
options = select.find_elements_by_tag_name("option") #get all the options into a list

optionsList = []

for option in options: #iterate over the options, place attribute value in list
    optionsList.append(option.get_attribute("value"))

for optionValue in optionsList:
    print "starting loop on option %s" % optionValue

    select = Select(self.br.find_element_by_name( field ))
    select.select_by_value(optionValue)

    source = self.br.page_source #get the new page source

    #now check to see if some required data is on the navigated page, and print some stuff if so
    if "There is no summary data available." not in source:
         print "the new page is good! Here are the original args: ", optionValue
    #time to go back to the main page and click the next option element
    self.br.back()
    print "went backwards" #for debugging

这里的想法是在第一个for循环中构建一个选项值列表,然后迭代这些选项值以导航到第二个for循环中的第二个页面。使用python Select库选择这些选项值。每次通过第二个for循环时,我都会在下一行中获取对下拉列表的新引用。

我希望这很有用