Question

我需要从以下链接中提取年份，汽车模型和汽车数据： https://auto-buy.geico.com/nb#/sale/vehicle/gskmsi/

以下是我的代码：

from selenium import webdriver
from selenium.webdriver.support import ui
from selenium.common.exceptions import TimeoutException
chromedriver = "D:\Codes\Webscraping\chromedriver.exe"



driver=webdriver.Chrome(executable_path=chromedriver)

try:
    driver.set_page_load_timeout(100)
    driver.get('https://auto-buy.geico.com/nb#/sale/vehicle/gskmsi/')
    select_element = ui.Select(ui.WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "vehicleYear"))))
    select_element.select_by_visible_text("2017")
    time.sleep(5)
    select_element = ui.Select(ui.WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "vehicleMake"))))
    select_element.select_by_visible_text("Acura")
    time.sleep(5)
    select_element = ui.Select(ui.WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "vehicleModel"))))
    select_element.select_by_visible_text("ILX")
    for i in driver.find_elements_by_xpath("//*[@id='vehicleMake']"):
        print (i.get_attribute("value"))
    select_box = Select(driver.find_element_by_xpath("//select[@class='vehicleMake']"))
    # get all options
    options = select_box.options
    print(options)    
except TimeoutException as ex:
    isrunning = 0
    print("Exception has been thrown. " + str(ex))
    driver.close()

注意：在运行代码时，将加载第一个客户信息页面，您可以使用zip 75002随机填写

我的问题在于我如何从网站上提取年份，汽车模型和汽车制造的所有值？硒有助于此吗？或者我现在使用美丽的汤？任何与代码相关的帮助都会很棒。

编辑：我在代码中没有任何错误。我只是不知道提取年份，汽车模型和汽车制造价值的代码提前致谢

Answer 1

最大的问题是只在选择年份后才会填充。只有选择年份后才会填充模型。您将不得不遍历每个下拉列表以检索所有值。我不会提供整个代码，但它应该非常简单。首先获取Year下拉列表及其值

year_dropdown = driver.find_element_by_xpath('//select[@id="vehicleYear"]')
years = [year.text for year in year_dropdown.find_elements_by_tag_name('option')]

您将获得一个空白值作为此列表中的第一项，因为下拉列表中的第一项是空白。你可以选择删除它：

years = years[1:]

或者，一种更安全的方法：

years = [year for year in years if year]

此方法仅保留列表中非空的值。

因为你将不得不迭代这一年的下拉：

for year in years:
    year_dropdown.find_element_by_xpath('.//option[text()="%s"]' % year).click()

在那个for循环中，你现在必须做同样的事情，但对于make：

make_dropdown = driver.find_element_by_xpath('//select[@id="vehicleMake"]')
makes = [make.text for year in year_dropdown.find_elements_by_tag_name('option')]

看看我们要去哪里？你现在正在重复与年份下拉列表相同的代码，但是对于Make。你将为模型做同样的事情。你的流程最终会像：

for year in years:
    for make in makes:
        for model in models:
           ...

然而，我们不知道的是你计划使用提取的数据，所以我无法帮助你输出。但这是你提取数据的方法。请注意，每个for循环迭代都会覆盖它的子列表。所以makes将在迭代一年后被覆盖，并且在迭代完make后，每个models都会被覆盖。

从＆＃39;下拉菜单中提取选项值＆＃39;使用Selenium / Beautiful Soup- Python

1 个答案: