我对使用pandas数据阅读器的雅虎财务功能有疑问。我现在使用了几个月的股票代码清单,并按以下几行执行:
import pandas_datareader as pdr
import datetime
stocks = ["stock1","stock2",....]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
f = pdr.DataReader(stocks, 'yahoo',start,end)
从昨天开始,我收到错误" IndexError:列表索引超出范围",只有当我尝试获得多个股票时才会出现。
最近几天我有什么变化需要考虑,或者你有更好的解决方案吗?
答案 0 :(得分:7)
如果您通读Pandas DataReader' documentation,他们会立即对多个数据源API发布折旧,其中一个是Yahoo!金融。
v0.6.0(2018年1月24日)
立即弃用 Yahoo!, Google Options 和 Quotes 和 EDGAR 。 这些API背后的终点已经彻底改变了 现有的读者需要完整的重写。在大多数 Yahoo!的情况下 数据已删除端点。 PDR希望恢复这些 我们欢迎提供功能和拉取请求。
这可能是您获得IndexError
(或任何其他通常不存在的错误)的原因的罪魁祸首。
然而,还有另一个Python包,其目标是修复对Yahoo!的支持。为Pandas DataReader提供资金,您可以在此处找到该软件包:
https://pypi.python.org/pypi/fix-yahoo-finance
根据他们的文件:
Yahoo! finance已停用其历史数据API,导致许多依赖它的程序停止工作。
fix-yahoo-finance 通过从 Yahoo抓取数据来提供问题的临时解决方案!财务使用并返回Pandas DataFrame / Panel的格式与 pandas_datareader 相同
get_data_yahoo()
。基本上是“劫持”
pandas_datareader.data.get_data_yahoo()
方法, fix-yahoo-finance 的植入很容易,只需要 将fix_yahoo_finance
导入您的代码。
您需要添加的是:
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override()
stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
f = pdr.get_data_yahoo(stocks, start=start, end=end)
甚至不需要Pandas DataReader:
import fix_yahoo_finance as yf
stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
data = yf.download(stocks, start=start, end=end)
答案 1 :(得分:5)
您可以将新的Python YahooFinancials模块与熊猫一起使用来执行此操作。 YahooFinancials的构建良好,并通过散列每个Yahoo Finance网页中存在的数据存储对象来获取数据,因此它速度很快,并且不依赖于旧的停产api也不像刮板那样依赖Web驱动程序。数据以JSON的形式返回,您可以通过传入股票/指数行情清单来初始化YahooFinancials类,从而一次提取任意数量的股票。
$ pip安装yahoofinancials
用法示例:
from yahoofinancials import YahooFinancials
import pandas as pd
# Select Tickers and stock history dates
ticker = 'AAPL'
ticker2 = 'MSFT'
ticker3 = 'INTC'
index = '^NDX'
freq = 'daily'
start_date = '2012-10-01'
end_date = '2017-10-01'
# Function to clean data extracts
def clean_stock_data(stock_data_list):
new_list = []
for rec in stock_data_list:
if 'type' not in rec.keys():
new_list.append(rec)
return new_list
# Construct yahoo financials objects for data extraction
aapl_financials = YahooFinancials(ticker)
mfst_financials = YahooFinancials(ticker2)
intl_financials = YahooFinancials(ticker3)
index_financials = YahooFinancials(index)
# Clean returned stock history data and remove dividend events from price history
daily_aapl_data = clean_stock_data(aapl_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker]['prices'])
daily_msft_data = clean_stock_data(mfst_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker2]['prices'])
daily_intl_data = clean_stock_data(intl_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker3]['prices'])
daily_index_data = index_financials.get_historical_stock_data(start_date, end_date, freq)[index]['prices']
stock_hist_data_list = [{'NDX': daily_index_data}, {'AAPL': daily_aapl_data}, {'MSFT': daily_msft_data},
{'INTL': daily_intl_data}]
# Function to construct data frame based on a stock and it's market index
def build_data_frame(data_list1, data_list2, data_list3, data_list4):
data_dict = {}
i = 0
for list_item in data_list2:
if 'type' not in list_item.keys():
data_dict.update({list_item['formatted_date']: {'NDX': data_list1[i]['close'], 'AAPL': list_item['close'],
'MSFT': data_list3[i]['close'],
'INTL': data_list4[i]['close']}})
i += 1
tseries = pd.to_datetime(list(data_dict.keys()))
df = pd.DataFrame(data=list(data_dict.values()), index=tseries,
columns=['NDX', 'AAPL', 'MSFT', 'INTL']).sort_index()
return df
一次处理多个库存数据的示例(返回每个股票行情的JSON对象列表):
from yahoofinancials import YahooFinancials
tech_stocks = ['AAPL', 'MSFT', 'INTC']
bank_stocks = ['WFC', 'BAC', 'C']
yahoo_financials_tech = YahooFinancials(tech_stocks)
yahoo_financials_banks = YahooFinancials(bank_stocks)
tech_cash_flow_data_an = yahoo_financials_tech.get_financial_stmts('annual', 'cash')
bank_cash_flow_data_an = yahoo_financials_banks.get_financial_stmts('annual', 'cash')
banks_net_ebit = yahoo_financials_banks.get_ebit()
tech_stock_price_data = tech_cash_flow_data.get_stock_price_data()
daily_bank_stock_prices = yahoo_financials_banks.get_historical_stock_data('2008-09-15', '2017-09-15', 'daily')
JSON输出示例:
代码:
yahoo_financials = YahooFinancials('WFC')
print(yahoo_financials.get_historical_stock_data("2017-09-10", "2017-10-10", "monthly"))
JSON返回:
{
"WFC": {
"prices": [
{
"volume": 260271600,
"formatted_date": "2017-09-30",
"high": 55.77000045776367,
"adjclose": 54.91999816894531,
"low": 52.84000015258789,
"date": 1506830400,
"close": 54.91999816894531,
"open": 55.15999984741211
}
],
"eventsData": [],
"firstTradeDate": {
"date": 76233600,
"formatted_date": "1972-06-01"
},
"isPending": false,
"timeZone": {
"gmtOffset": -14400
},
"id": "1mo15050196001507611600"
}
}
答案 2 :(得分:0)
由于Yahoo更改了格式,因此yahoo_finance不再可用,fix_yahoo_finance足以下载数据。但是,要解析,您将需要其他库,这是简单的工作示例:
import numpy as np #python library for scientific computing
import pandas as pd #python library for data manipulation and analysis
import matplotlib.pyplot as plt #python library for charting
import fix_yahoo_finance as yf #python library to scrap data from yahoo finance
from pandas_datareader import data as pdr #extract data from internet sources into pandas data frame
yf.pdr_override()
data = pdr.get_data_yahoo(‘^DJI’, start=”2006–01–01")
data2 = pdr.get_data_yahoo(“MSFT”, start=”2006–01–01")
data3 = pdr.get_data_yahoo(“AAPL”, start=”2006–01–01")
data4 = pdr.get_data_yahoo(“BB.TO”, start=”2006–01–01")
ax = (data[‘Close’] / data[‘Close’].iloc[0] * 100).plot(figsize=(15, 6))
(data2[‘Close’] / data2[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data3[‘Close’] / data3[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data4[‘Close’] / data5[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
plt.legend([‘Dow Jones’, ‘Microsoft’, ‘Apple’, ‘Blackberry’], loc=’upper left’)
plt.show()
有关代码的说明,您可以访问https://medium.com/@gerrysabar/charting-stocks-price-from-yahoo-finance-using-fix-yahoo-finance-library-6b690cac5447
答案 3 :(得分:0)
尝试这个简单的代码
watchlist=["stock1","stock2".......]
closing_price=pd.DataFrame()
symbols=[]
for i in watchlist:
Result=wb.DataReader(i,start='05-1-20', end='05-20-20',data_source='yahoo')
closing_price=closing_price.append(Result)
symbols.append(i)
print("Generating Closing price for",i)
closing_price["SYMBOL"]=symbols
print("closing_price"