Question

我正在尝试从网站中提取一些数据。但是，该站点具有分层结构。它的顶部有一个下拉菜单，其选项值为URL。因此，我的方法是：

找到下拉框，
选择一个选项，
提取一些数据，
对所有可用选项重复步骤2到4.

下面是我的代码，我能够在默认选择的选项（第一个）下提取数据。但是我收到了错误Message: Element not found in the cache - perhaps the page has changed since it was looked up。好像我的浏览器没有切换到新页面。我尝试了time.sleep()或driver.refresh()，但失败了......任何建议都值得赞赏！

###html
<select class="form-control"> 
    <option value="/en/url1">001 Key</option>
    <option value="/en/url2">002 Key</option>
</select>

### python code

# select the dropdown menu
select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
# get all options
options = select_box.options

for ele_index, element in enumerate(options):
    # select a url
    select_box.select_by_index(ele_index)
    time.sleep(5)
    print element.text

    # extract page data
    id_comp_html = driver.find_elements_by_class_name('HorDL')
    for q in id_comp_html:
        print q.get_attribute("innerHTML")
        print "============="

更新1（基于alecxe＆＃39; s解决方案）

# dropdown menu
select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
options = select_box.options

for ele_index in range(len(options)):
    # select a url
    select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
    print select_box.options[ele_index].text
    select_box.select_by_index(ele_index)
    # print element.text
    # print "======"
    driver.implicitly_wait(5)
    id_comp_html = driver.find_elements_by_class_name('HorDL')
    for q in id_comp_html:
        print q.get_attribute("innerHTML")
        print "============="

Answer 1

您的select_box和element引用过时了，您必须重新找到＆＃34;在循环中操作选项索引时选择元素：

# select the dropdown menu
select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
# get all options
options = select_box.options

for ele_index in range(len(options)):
    # select a url
    select_box = Select(driver.find_element_by_xpath("//select[@class='form-control']"))
    select_box.select_by_index(ele_index)

    # ...
    element = select_box.options[ele_index]

您可能还需要在选择选项并提取所需数据后导航回来。这可以通过driver.back()完成。

Answer 2

根据您对网站和代码的描述，从下拉列表中选择一个选项会将您发送到另一个页面;所以在for循环中第一次迭代之后，你已经移动到另一个页面，而你的options变量指向上一页中的元素。

一个特定于您的情况的解决方案（在这种情况下可能是最好的）将是存储选项值（即网址），并通过.get()方法直接导航到这些网址。

否则，您需要保留计数器并在每次迭代时获取下拉列表的内容，或在每次迭代后向后导航，在这种情况下这两种选择都是不必要的。

使用selenium python从下拉框中选择每个选项

更新1（基于alecxe＆＃39; s解决方案）

2 个答案: