使用Selenium和Python遍历下拉菜单

时间:2019-02-27 19:17:55

标签: python selenium web-scraping

我正在尝试浏览以下URL上的下拉菜单:https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006

例如,在选项下的第一个下拉菜单列出了不同的材料,我想依次选择每个材料,然后从网页中收集其他一些信息,然后再继续研究下一个材料。这是我当前的代码:

driver = webdriver.Firefox()
driver.get('https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006')

time.sleep(3)

driver.find_element_by_id('x-mark-icon').click()

select = Select(driver.find_element_by_name('Wiqj7mb4rsAq9LB'))
options = select.options
optionsList = []

driver.find_elements_by_class_name('select-wrapper')[0].click()

element = driver.find_element_by_xpath("//select[@name='Wiqj7mb4rsAq9LB']")
actions = ActionChains(driver)
actions.move_to_element(element).perform()

# driver.execute_script("arguments[0].scrollIntoView();", element)


for option in options: #iterate over the options, place attribute value in list
    optionsList.append(option.get_attribute("value"))

for optionValue in optionsList:
    print("starting loop on option %s" % optionValue)
    # select = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//select[@name='Wiqj7mb4rsAq9LB']")))
    # select = Select(select)
    select.select_by_value(optionValue)

我只是从循环开始,但是遇到了这个错误:

ElementNotInteractableException: Message: Element <option> could not be scrolled into view

然后我添加了webdriverwait并收到TimeoutException错误。

然后我意识到我可能应该单击包含下拉菜单的包装器,所以我添加了单击,它可以弹出菜单,但是仍然出现TimeoutException。

所以我想,也许我应该移至该元素,该元素在我用动作链线尝试过后出现了这个错误

WebDriverException: Message: TypeError: rect is undefined

我尝试通过使用以下代码来避免该错误:

    # driver.execute_script("arguments[0].scrollIntoView();", element)

只是再次导致timeoutexception。

我对Python和Selenium相当陌生,基本上只是修改了SO对类似问题的答案中的代码,但没有任何效果。

我正在使用python 3.6以及Selenium和firefox Webdriver的当前版本。

如果不清楚,或者您需要更多信息,请告诉我。

非常感谢!

编辑:基于Kajal Kunda的回答和评论,我将代码更新为以下内容:

`material_dropdown = driver.find_element_by_xpath("//input[@class='select- 
dropdown']")

driver.execute_script("arguments[0].click();", material_dropdown)

materials=driver.find_elements_by_css_selector("div.select-wrapper 
ul.dropdown-content li")



for material in materials:

    # material_dropdown = 
    driver.find_element_by_xpath("//input[@class='select-dropdown']")

    # driver.execute_script("arguments[0].click();", material_dropdown)

    # materials=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")

    material_ele=material.find_element_by_tag_name('span')

if material_ele.text!='':

    material_ele.click()

    time.sleep(5)

    price = driver.find_element_by_class_name("dataPriceDisplay")

    print(price.text)`

结果是它成功打印出了第一类材料的价格,但随后返回: StaleElementReferenceException: Message: The element reference of <li class=""> is stale;...

我已经尝试过在循环内外添加散列行的变体,但始终会得到StaleElementReferenceException错误的版本。

有什么建议吗?

谢谢!

2 个答案:

答案 0 :(得分:1)

您可以使用requests完成全部操作。从下拉列表中列出的选项中获取下拉列表,然后将value属性连接到请求url中,该URL检索包含页面上所有信息的json。添加其他下拉值的原理相同。每个下拉选择的ID是下拉菜单中选项的value属性,并显示在我显示的网址中,每个下拉选择的ID由//分隔。

import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.accuform.com/product/getSku/danger-danger-authorized-personnel-only-MADM006/1/false/null//{}//WHFIw3xXmQx8zlz//6wr93DdrFo5JV//WdnO0RpwKpc4fGF'
startURL = 'https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006'

res = requests.get(startURL)
soup = bs(res.content, 'lxml')
materials = [item['value'] for item in soup.select('#Wiqj7mb4rsAq9LB option')]
sizes = [item['value'] for item in soup.select('#WvXESrTyQjM3Ciw option')]
languages = [item['value'] for item in soup.select('#WUYWGMePtpmpmhy option')]
units = [item['value'] for item in soup.select('#W91eqaJ0WPXwe9b option')]

for material in materials:
    data = requests.get(url.format(material)).json()
    soup = bs(data['dataMaterialBullets'], 'lxml')
    lines = [item.text for item in soup.select('li')]
    print(lines)
    print(data['dataPriceDisplay'])     
    # etc......

JSON示例:

答案 1 :(得分:0)

尝试以下代码。应该可以。

    driver = webdriver.Firefox()
    driver.get('https://www.accuform.com/safety-sign/danger-danger-authorized-personnel-only-MADM006')

    time.sleep(3)
    driver.find_element_by_id('x-mark-icon').click()

    material_dropdown = driver.find_element_by_xpath("//input[@class='select-dropdown']")
    driver.execute_script("arguments[0].click();", material_dropdown)

    #Code for material dropdown
    materials=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")


    material_optionsList = []
    for material in materials:
        material_ele=material.find_element_by_tag_name('span')
        if material_ele.text!='':
          material_optionsList.append(material_ele.text)

    print(material_optionsList)

    driver.execute_script("arguments[0].click();", material_dropdown)


    size_dropdown = driver.find_element_by_xpath("(//input[@class='select-dropdown'])[2]")
    driver.execute_script("arguments[0].click();", size_dropdown)

    #Code for size dropdown
    Sizes=driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
    size_optionsList = []
    for size in Sizes:
        size_ele=size.find_element_by_tag_name('span')
        if size_ele.text!='':
            size_optionsList.append(size_ele.text)



driver.execute_script("arguments[0].click();", size_dropdown)

输出:

[u'Adhesive Vinyl', u'Plastic', u'Adhesive Dura-Vinyl', u'Aluminum', u'Dura-Plastic\u2122', u'Aluma-Lite\u2122', u'Dura-Fiberglass\u2122', u'Accu-Shield\u2122']

希望您会做剩下的工作。让我知道它是否对您有用。

编辑代码以循环浏览并获取材料的价格值。

for material in range(len(materials)):
    material_ele=materials[material]

    if material_ele.text!='':
       #material_optionsList.append(material_ele.text)
       #material_ele.click()
       driver.execute_script("arguments[0].click();", material_ele)
       time.sleep(2)
       price = driver.find_element_by_id("priceDisplay")
       print( price.text)
       time.sleep(2)
       material_dropdown = driver.find_element_by_xpath("//input[@class='select-dropdown']")
       driver.execute_script("arguments[0].click();", material_dropdown)
       materials = driver.find_elements_by_css_selector("div.select-wrapper ul.dropdown-content li")
       material+=2

输出:

$8.31
$9.06
$13.22
$15.91
$15.91