我尝试使用data_reader从Yahoo Finance下载和整理数据。程序很简单:
对于每一只股票,我会做以下事情:
aapl = data.DataReader('AAPL', 'yahoo', '2004-01-01')
del aapl['Close']
aapl.rename(columns={'Adj Close': 'Close'}, inplace=True)
gs = data.DataReader('GS', 'yahoo', '2004-01-01')
del gs['Close']
gs.rename(columns={'Adj Close': 'Close'}, inplace=True)
然后是这样的:
aapl.columns = pd.MultiIndex.from_product([['aapl'], aapl.columns])
gs.columns = pd.MultiIndex.from_product([['gs'], gs.columns])
最后把它们放在一起:
data = pd.concat([aapl, gs], axis = 1)
我怎样才能这样做,以便使用for循环对100多个代码列表有效:
结构将是这样的:
stocks = ['AAPL', 'GS']
for i in stocks:
i = data.DataReader(i, 'yahoo', '2004-01-01')
del i['Close']
i.rename(columns={'Adj Close': 'Close'}, inplace=True)
i.columns = pd.MultiIndex.from_product([['i'], i.columns])
# append to df
虚拟示例的所需输出将是:
df.head()
aapl gs
Close Close
Date
2004-01-02 1.38 83.58
2004-01-05 1.44 83.63
2004-01-06 1.43 83.13
2004-01-07 1.46 84.87
2004-01-08 1.51 84.98
答案 0 :(得分:3)
我使用Pandas.Panel:
In [69]: from pandas_datareader import data
In [70]: stocks = ['AAPL', 'GS']
将所有股票的财务数据一步到Pandas.Panel:
In [71]: p = data.DataReader(stocks, 'yahoo', '2004-01-01')
In [72]: p.axes
Out[72]:
[Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object'),
DatetimeIndex(['2004-01-02', '2004-01-05', '2004-01-06', '2004-01-07', '2004-01-08', '2004-01-09', '2004-01-12', '2004-01-13', '2004-01-14'
, '2004-01-15',
...
'2017-02-16', '2017-02-17', '2017-02-21', '2017-02-22', '2017-02-23', '2017-02-24', '2017-02-27', '2017-02-28', '2017-03-01'
, '2017-03-02'],
dtype='datetime64[ns]', name='Date', length=3314, freq=None),
Index(['AAPL', 'GS'], dtype='object')]
现在你可以像这样切片这个面板:
In [73]: p.loc['Adj Close']
Out[73]:
AAPL GS
Date
2004-01-02 1.378514 83.582711
2004-01-05 1.436168 83.625740
2004-01-06 1.430985 83.126634
2004-01-07 1.463375 84.873497
2004-01-08 1.513256 84.976763
2004-01-09 1.489935 83.901107
2004-01-12 1.537224 84.142053
2004-01-13 1.562488 84.047395
2004-01-14 1.567671 85.518890
... ... ...
更新:将Panel转换为多级DataFrame:
MultiIndex DataFrame:
In [80]: p.to_frame()
Out[80]:
Open High Low Close Volume Adj Close
Date minor
2004-01-02 AAPL 21.549999 21.750000 21.180001 21.280000 36160600.0 1.378514
GS 98.800003 99.089996 96.580002 97.129997 3042300.0 83.582711
2004-01-05 AAPL 21.420000 22.390000 21.420000 22.170000 98754600.0 1.436168
GS 97.300003 97.940002 96.150002 97.180000 4817700.0 83.625740
2004-01-06 AAPL 22.250000 22.420001 21.710000 22.090000 127337000.0 1.430985
GS 97.360001 97.669998 96.379997 96.599998 4077800.0 83.126634
2004-01-07 AAPL 22.100000 22.830000 21.930000 22.590000 146718600.0 1.463375
GS 96.760002 98.860001 96.449997 98.629997 4457800.0 84.873497
2004-01-08 AAPL 22.840000 23.730001 22.649999 23.360001 115075800.0 1.513256
GS 98.730003 98.980003 97.699997 98.750000 3687800.0 84.976763
... ... ... ... ... ... ...
2017-02-24 AAPL 135.910004 136.660004 135.279999 136.660004 21690900.0 136.660004
GS 247.699997 248.880005 246.100006 247.350006 3565400.0 246.705168
2017-02-27 AAPL 137.139999 137.440002 136.279999 136.929993 20196400.0 136.929993
GS 247.210007 249.759995 246.610001 249.330002 2372600.0 248.680002
2017-02-28 AAPL 137.080002 137.440002 136.699997 136.990005 23403500.0 136.990005
GS 248.000000 249.000000 245.610001 248.059998 3627100.0 248.059998
2017-03-01 AAPL 137.889999 140.149994 137.600006 139.789993 36272400.0 139.789993
GS 253.710007 255.149994 251.259995 252.710007 5218300.0 252.710007
2017-03-02 AAPL 140.000000 140.279999 138.759995 138.960007 26153300.0 138.960007
GS 253.520004 254.240005 250.970001 251.059998 3014300.0 251.059998
[6628 rows x 6 columns]
MultiColumn DataFrame:
In [81]: p.to_frame().unstack()
Out[81]:
Open High Low Close \
minor AAPL GS AAPL GS AAPL GS AAPL GS
Date
2004-01-02 21.549999 98.800003 21.750000 99.089996 21.180001 96.580002 21.280000 97.129997
2004-01-05 21.420000 97.300003 22.390000 97.940002 21.420000 96.150002 22.170000 97.180000
2004-01-06 22.250000 97.360001 22.420001 97.669998 21.710000 96.379997 22.090000 96.599998
2004-01-07 22.100000 96.760002 22.830000 98.860001 21.930000 96.449997 22.590000 98.629997
2004-01-08 22.840000 98.730003 23.730001 98.980003 22.649999 97.699997 23.360001 98.750000
2004-01-09 23.229999 98.739998 24.130000 98.750000 22.789999 97.290001 23.000001 97.500000
2004-01-12 23.250000 97.599998 24.000000 97.849998 23.100000 96.449997 23.730001 97.779999
2004-01-13 24.700000 97.849998 24.839999 97.949997 23.860000 97.040001 24.120000 97.669998
2004-01-14 24.399999 97.500000 24.539999 99.500000 23.780000 97.459999 24.200000 99.379997
2004-01-15 22.910000 100.400002 23.400000 102.000000 22.499999 99.949997 22.850001 101.139999
... ... ... ... ... ... ... ... ...
2017-02-16 135.669998 250.300003 135.899994 250.779999 134.839996 248.440002 135.350006 249.440002
2017-02-17 135.100006 247.509995 135.830002 250.559998 135.100006 247.110001 135.720001 250.380005
2017-02-21 136.229996 251.000000 136.750000 252.649994 135.979996 250.710007 136.699997 251.759995
2017-02-22 136.429993 250.059998 137.119995 252.350006 136.110001 250.000000 137.110001 251.729996
2017-02-23 137.380005 251.309998 137.479996 251.899994 136.300003 249.320007 136.529999 251.190002
2017-02-24 135.910004 247.699997 136.660004 248.880005 135.279999 246.100006 136.660004 247.350006
2017-02-27 137.139999 247.210007 137.440002 249.759995 136.279999 246.610001 136.929993 249.330002
2017-02-28 137.080002 248.000000 137.440002 249.000000 136.699997 245.610001 136.990005 248.059998
2017-03-01 137.889999 253.710007 140.149994 255.149994 137.600006 251.259995 139.789993 252.710007
2017-03-02 140.000000 253.520004 140.279999 254.240005 138.759995 250.970001 138.960007 251.059998
如果需要,您还可以对MultiLevel列进行排序:
In [96]: p.to_frame().unstack().swaplevel(axis=1).sort_index(1)
Out[96]:
minor AAPL GS \
Adj Close Close High Low Open Volume Adj Close Close
Date
2004-01-02 1.378514 21.280000 21.750000 21.180001 21.549999 36160600.0 83.582711 97.129997
2004-01-05 1.436168 22.170000 22.390000 21.420000 21.420000 98754600.0 83.625740 97.180000
2004-01-06 1.430985 22.090000 22.420001 21.710000 22.250000 127337000.0 83.126634 96.599998
2004-01-07 1.463375 22.590000 22.830000 21.930000 22.100000 146718600.0 84.873497 98.629997
2004-01-08 1.513256 23.360001 23.730001 22.649999 22.840000 115075800.0 84.976763 98.750000
2004-01-09 1.489935 23.000001 24.130000 22.789999 23.229999 106864800.0 83.901107 97.500000
2004-01-12 1.537224 23.730001 24.000000 23.100000 23.250000 121886800.0 84.142053 97.779999
2004-01-13 1.562488 24.120000 24.839999 23.860000 24.700000 169754200.0 84.047395 97.669998
2004-01-14 1.567671 24.200000 24.539999 23.780000 24.399999 155010800.0 85.518890 99.379997
2004-01-15 1.480218 22.850001 23.400000 22.499999 22.910000 254552200.0 87.033415 101.139999
... ... ... ... ... ... ... ... ...
2017-02-16 135.350006 135.350006 135.899994 134.839996 135.669998 22118000.0 248.789715 249.440002
2017-02-17 135.720001 135.720001 135.830002 135.100006 135.100006 22084500.0 249.727267 250.380005
2017-02-21 136.699997 136.699997 136.750000 135.979996 136.229996 24265100.0 251.103659 251.759995
2017-02-22 137.110001 137.110001 137.119995 136.110001 136.429993 20745300.0 251.073739 251.729996
2017-02-23 136.529999 136.529999 137.479996 136.300003 137.380005 20704100.0 250.535153 251.190002
2017-02-24 136.660004 136.660004 136.660004 135.279999 135.910004 21690900.0 246.705168 247.350006
2017-02-27 136.929993 136.929993 137.440002 136.279999 137.139999 20196400.0 248.680002 249.330002
2017-02-28 136.990005 136.990005 137.440002 136.699997 137.080002 23403500.0 248.059998 248.059998
2017-03-01 139.789993 139.789993 140.149994 137.600006 137.889999 36272400.0 252.710007 252.710007
2017-03-02 138.960007 138.960007 140.279999 138.759995 140.000000 26153300.0 251.059998 251.059998
答案 1 :(得分:1)
这将依次下载每组股票定价数据,并将pd.concat()
数据下载到单个pandas.DataFrame
。
<强>代码:强>
stocks = ['AAPL', 'GS']
data = None
for stock_name in stocks:
# fetch the price data
stock_data = data.DataReader(stock_name, 'yahoo', '2004-01-01')
# remove the closing price
del stock_data['Close']
# rename Adjusted Close to Close
stock_data.rename(columns={'Adj Close': 'Close'}, inplace=True)
# Add a multi index for the stock name
stock_data.columns = pd.MultiIndex.from_product(
[[stock_name], stock_data.columns])
# concat this stock to the previous stocks
if data is None:
data = stock_data
else:
data = pd.concat([data, stock_data], axis=1)
<强>结果:强>
AAPL \
Open High Low Volume Close
Date
2004-01-02 21.549999 21.750000 21.180001 36160600 1.378514
2004-01-05 21.420000 22.390000 21.420000 98754600 1.436168
2004-01-06 22.250000 22.420001 21.710000 127337000 1.430985
... ... ... ... ... ...
2017-02-28 137.080002 137.440002 136.699997 23403500 136.990005
2017-03-01 137.889999 140.149994 137.600006 36272400 139.789993
2017-03-02 140.000000 140.279999 138.759995 26153300 138.960007
GS
Open High Low Volume Close
Date
2004-01-02 98.800003 99.089996 96.580002 3042300 83.582711
2004-01-05 97.300003 97.940002 96.150002 4817700 83.625740
2004-01-06 97.360001 97.669998 96.379997 4077800 83.126634
... ... ... ... ... ...
2017-02-28 248.000000 249.000000 245.610001 3627100 248.059998
2017-03-01 253.710007 255.149994 251.259995 5218300 252.710007
2017-03-02 253.520004 254.240005 250.970001 3014300 251.059998