这是我试图抓取数据的网站: https://finance.yahoo.com/quote/AAPL/options
我在此标记下方的下拉菜单中获取日期值:
<select class="Fz(s)" data-reactid="5"></select>
以下是我正在尝试运行的代码:
from bs4 import BeautifulSoup
from urllib2 import urlopen
optionsUrl = 'https://finance.yahoo.com/quote/aapl/options'
optionsPage = urlopen(optionsUrl)
soup = BeautifulSoup(optionsPage, 'lxml')
bigDates = soup.findAll('select' , {'class' : 'Fz(s)'})
我的问题是它找不到任何东西,即使我在Chrome中检查它时可以看到这些元素。
如何在选择下拉菜单中获取日期?
答案 0 :(得分:1)
下拉菜单内容是动态生成的。因此,要访问日期,您可以使用selenium
访问内容:
安装selenium:
pip install selenium
为您选择的浏览器安装适当的驱动程序:
http://selenium-python.readthedocs.io/installation.html#drivers
from selenium import webdriver
from bs4 import BeautifulSoup as soup
driver = webdriver.Chrome('path/to/driver/')
driver.get('https://finance.yahoo.com/quote/aapl/options')
dates = [i.text for i in soup(driver.page_source, 'lxml').find_all('option')]
输出:
[u'February 16, 2018', u'February 23, 2018', u'March 2, 2018', u'March 9, 2018', u'March 23, 2018', u'March 29, 2018', u'April 20, 2018', u'May 18, 2018', u'June 15, 2018', u'July 20, 2018', u'September 21, 2018', u'January 18, 2019', u'January 17, 2020']