答案 0 :(得分:2)
您当前的方法存在的问题是您找到a
元素并尝试将其用作select
元素。 Select
class仅适用于select
元素。
请注意,在这种情况下,更容易找到不可见的select
元素并直接从其选项中获取年份:
options = [option.get_attribute("innerText") for option in driver.find_elements_by_css_selector("select#ddlYear option")[1:]]
print(options)
这里的[1:]
切片是跳过第一个Select Year
元素。
完整的工作代码:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver = webdriver.Chrome("/usr/local/bin/chromedriver")
driver.get("https://www.osram-americas.com/en-us/applications/automotive-lighting-systems/Pages/lrgmain.aspx")
wait = WebDriverWait(driver, 10)
toggle = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#fldYear a.sbToggle")))
toggle.click()
options = [option.get_attribute("innerText") for option in driver.find_elements_by_css_selector("select#ddlYear option")[1:]]
print(options)
打印:
[u'2016', u'2015', u'2014', u'2013', u'2012', u'2011', u'2010', u'2009', u'2008', u'2007', u'2006', u'2005', u'2004', u'2003', u'2002', u'2001', u'2000', u'1999', u'1998', u'1997', u'1996', u'1995', u'1994', u'1993', u'1992', u'1991', u'1990', u'1989', u'1988', u'1987', u'1986', u'1985', u'1984', u'1983', u'1982', u'1981', u'1980', u'1979', u'1978', u'1977', u'1976', u'1975', u'1974', u'1973', u'1972', u'1971', u'1970', u'1969', u'1968', u'1967', u'1966', u'1965', u'1964', u'1963', u'1962', u'1961', u'1960', u'1959', u'1958', u'1957', u'1956', u'1955']
答案 1 :(得分:2)
您实际上并不需要selenium
并自动执行任何可视化交互以从页面获取年份+制作+模型数据,并且只能通过requests
处理问题,只会发出相应的GET请求:
# -*- coding: utf-8 -*-
from collections import defaultdict
from pprint import pprint
import requests
year = 2016
d = defaultdict(lambda: defaultdict(list))
with requests.Session() as session:
session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}
session.get("https://www.osram-americas.com/en-us/applications/automotive-lighting-systems/Pages/lrgmain.aspx")
while True:
response = session.get("https://www.osram-americas.com/_Layouts/Sylvania.Web.LRGHandler/LRGHandler.ashx",
params={"rt": "fetchmake", "year": str(year)})
data = response.json()
if not data: # break if no makes in a year
break
for make in data:
response = session.get("https://www.osram-americas.com/_Layouts/Sylvania.Web.LRGHandler/LRGHandler.ashx",
params={"rt": "fetchmodel", "year": str(year), "make": make["Id"]})
for model in response.json():
d[year][make["Value"]].append(model["Value"])
year -= 1
pprint(dict(d))