如何使用迭代附加到多级pandas数据帧?

时间:2017-03-03 02:11:12

标签: python pandas dataframe yahoo-finance multi-level

我尝试使用data_reader从Yahoo Finance下载和整理数据。程序很简单:

对于每一只股票,我会做以下事情:

aapl = data.DataReader('AAPL', 'yahoo', '2004-01-01')
del aapl['Close']
aapl.rename(columns={'Adj Close': 'Close'}, inplace=True)

gs = data.DataReader('GS', 'yahoo', '2004-01-01')
del gs['Close']
gs.rename(columns={'Adj Close': 'Close'}, inplace=True)

然后是这样的:

aapl.columns = pd.MultiIndex.from_product([['aapl'], aapl.columns])
gs.columns = pd.MultiIndex.from_product([['gs'], gs.columns])

最后把它们放在一起:

data = pd.concat([aapl, gs], axis = 1) 

我怎样才能这样做,以便使用for循环对100多个代码列表有效:

结构将是这样的:

stocks = ['AAPL', 'GS']

for i in stocks:
    i = data.DataReader(i, 'yahoo', '2004-01-01')
    del i['Close']
    i.rename(columns={'Adj Close': 'Close'}, inplace=True)
    i.columns = pd.MultiIndex.from_product([['i'], i.columns])
    # append to df

虚拟示例的所需输出将是:

 df.head()

            aapl    gs
           Close    Close
Date        
2004-01-02  1.38    83.58
2004-01-05  1.44    83.63
2004-01-06  1.43    83.13
2004-01-07  1.46    84.87
2004-01-08  1.51    84.98

2 个答案:

答案 0 :(得分:3)

我使用Pandas.Panel:

In [69]: from pandas_datareader import data

In [70]: stocks = ['AAPL', 'GS']

将所有股票的财务数据一步到Pandas.Panel:

In [71]: p = data.DataReader(stocks, 'yahoo', '2004-01-01')

In [72]: p.axes
Out[72]:
[Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='object'),
 DatetimeIndex(['2004-01-02', '2004-01-05', '2004-01-06', '2004-01-07', '2004-01-08', '2004-01-09', '2004-01-12', '2004-01-13', '2004-01-14'
, '2004-01-15',
                ...
                '2017-02-16', '2017-02-17', '2017-02-21', '2017-02-22', '2017-02-23', '2017-02-24', '2017-02-27', '2017-02-28', '2017-03-01'
, '2017-03-02'],
               dtype='datetime64[ns]', name='Date', length=3314, freq=None),
 Index(['AAPL', 'GS'], dtype='object')]

现在你可以像这样切片这个面板:

In [73]: p.loc['Adj Close']
Out[73]:
                  AAPL          GS
Date
2004-01-02    1.378514   83.582711
2004-01-05    1.436168   83.625740
2004-01-06    1.430985   83.126634
2004-01-07    1.463375   84.873497
2004-01-08    1.513256   84.976763
2004-01-09    1.489935   83.901107
2004-01-12    1.537224   84.142053
2004-01-13    1.562488   84.047395
2004-01-14    1.567671   85.518890
...                ...         ...

更新:将Panel转换为多级DataFrame:

MultiIndex DataFrame:

In [80]: p.to_frame()
Out[80]:
                        Open        High         Low       Close       Volume   Adj Close
Date       minor
2004-01-02 AAPL    21.549999   21.750000   21.180001   21.280000   36160600.0    1.378514
           GS      98.800003   99.089996   96.580002   97.129997    3042300.0   83.582711
2004-01-05 AAPL    21.420000   22.390000   21.420000   22.170000   98754600.0    1.436168
           GS      97.300003   97.940002   96.150002   97.180000    4817700.0   83.625740
2004-01-06 AAPL    22.250000   22.420001   21.710000   22.090000  127337000.0    1.430985
           GS      97.360001   97.669998   96.379997   96.599998    4077800.0   83.126634
2004-01-07 AAPL    22.100000   22.830000   21.930000   22.590000  146718600.0    1.463375
           GS      96.760002   98.860001   96.449997   98.629997    4457800.0   84.873497
2004-01-08 AAPL    22.840000   23.730001   22.649999   23.360001  115075800.0    1.513256
           GS      98.730003   98.980003   97.699997   98.750000    3687800.0   84.976763
...                      ...         ...         ...         ...          ...         ...
2017-02-24 AAPL   135.910004  136.660004  135.279999  136.660004   21690900.0  136.660004
           GS     247.699997  248.880005  246.100006  247.350006    3565400.0  246.705168
2017-02-27 AAPL   137.139999  137.440002  136.279999  136.929993   20196400.0  136.929993
           GS     247.210007  249.759995  246.610001  249.330002    2372600.0  248.680002
2017-02-28 AAPL   137.080002  137.440002  136.699997  136.990005   23403500.0  136.990005
           GS     248.000000  249.000000  245.610001  248.059998    3627100.0  248.059998
2017-03-01 AAPL   137.889999  140.149994  137.600006  139.789993   36272400.0  139.789993
           GS     253.710007  255.149994  251.259995  252.710007    5218300.0  252.710007
2017-03-02 AAPL   140.000000  140.279999  138.759995  138.960007   26153300.0  138.960007
           GS     253.520004  254.240005  250.970001  251.059998    3014300.0  251.059998

[6628 rows x 6 columns]

MultiColumn DataFrame:

In [81]: p.to_frame().unstack()
Out[81]:
                  Open                    High                     Low                   Close              \
minor             AAPL          GS        AAPL          GS        AAPL          GS        AAPL          GS
Date
2004-01-02   21.549999   98.800003   21.750000   99.089996   21.180001   96.580002   21.280000   97.129997
2004-01-05   21.420000   97.300003   22.390000   97.940002   21.420000   96.150002   22.170000   97.180000
2004-01-06   22.250000   97.360001   22.420001   97.669998   21.710000   96.379997   22.090000   96.599998
2004-01-07   22.100000   96.760002   22.830000   98.860001   21.930000   96.449997   22.590000   98.629997
2004-01-08   22.840000   98.730003   23.730001   98.980003   22.649999   97.699997   23.360001   98.750000
2004-01-09   23.229999   98.739998   24.130000   98.750000   22.789999   97.290001   23.000001   97.500000
2004-01-12   23.250000   97.599998   24.000000   97.849998   23.100000   96.449997   23.730001   97.779999
2004-01-13   24.700000   97.849998   24.839999   97.949997   23.860000   97.040001   24.120000   97.669998
2004-01-14   24.399999   97.500000   24.539999   99.500000   23.780000   97.459999   24.200000   99.379997
2004-01-15   22.910000  100.400002   23.400000  102.000000   22.499999   99.949997   22.850001  101.139999
...                ...         ...         ...         ...         ...         ...         ...         ...
2017-02-16  135.669998  250.300003  135.899994  250.779999  134.839996  248.440002  135.350006  249.440002
2017-02-17  135.100006  247.509995  135.830002  250.559998  135.100006  247.110001  135.720001  250.380005
2017-02-21  136.229996  251.000000  136.750000  252.649994  135.979996  250.710007  136.699997  251.759995
2017-02-22  136.429993  250.059998  137.119995  252.350006  136.110001  250.000000  137.110001  251.729996
2017-02-23  137.380005  251.309998  137.479996  251.899994  136.300003  249.320007  136.529999  251.190002
2017-02-24  135.910004  247.699997  136.660004  248.880005  135.279999  246.100006  136.660004  247.350006
2017-02-27  137.139999  247.210007  137.440002  249.759995  136.279999  246.610001  136.929993  249.330002
2017-02-28  137.080002  248.000000  137.440002  249.000000  136.699997  245.610001  136.990005  248.059998
2017-03-01  137.889999  253.710007  140.149994  255.149994  137.600006  251.259995  139.789993  252.710007
2017-03-02  140.000000  253.520004  140.279999  254.240005  138.759995  250.970001  138.960007  251.059998

如果需要,您还可以对MultiLevel列进行排序:

In [96]: p.to_frame().unstack().swaplevel(axis=1).sort_index(1)
Out[96]:
minor             AAPL                                                                       GS              \
             Adj Close       Close        High         Low        Open       Volume   Adj Close       Close
Date
2004-01-02    1.378514   21.280000   21.750000   21.180001   21.549999   36160600.0   83.582711   97.129997
2004-01-05    1.436168   22.170000   22.390000   21.420000   21.420000   98754600.0   83.625740   97.180000
2004-01-06    1.430985   22.090000   22.420001   21.710000   22.250000  127337000.0   83.126634   96.599998
2004-01-07    1.463375   22.590000   22.830000   21.930000   22.100000  146718600.0   84.873497   98.629997
2004-01-08    1.513256   23.360001   23.730001   22.649999   22.840000  115075800.0   84.976763   98.750000
2004-01-09    1.489935   23.000001   24.130000   22.789999   23.229999  106864800.0   83.901107   97.500000
2004-01-12    1.537224   23.730001   24.000000   23.100000   23.250000  121886800.0   84.142053   97.779999
2004-01-13    1.562488   24.120000   24.839999   23.860000   24.700000  169754200.0   84.047395   97.669998
2004-01-14    1.567671   24.200000   24.539999   23.780000   24.399999  155010800.0   85.518890   99.379997
2004-01-15    1.480218   22.850001   23.400000   22.499999   22.910000  254552200.0   87.033415  101.139999
...                ...         ...         ...         ...         ...          ...         ...         ...
2017-02-16  135.350006  135.350006  135.899994  134.839996  135.669998   22118000.0  248.789715  249.440002
2017-02-17  135.720001  135.720001  135.830002  135.100006  135.100006   22084500.0  249.727267  250.380005
2017-02-21  136.699997  136.699997  136.750000  135.979996  136.229996   24265100.0  251.103659  251.759995
2017-02-22  137.110001  137.110001  137.119995  136.110001  136.429993   20745300.0  251.073739  251.729996
2017-02-23  136.529999  136.529999  137.479996  136.300003  137.380005   20704100.0  250.535153  251.190002
2017-02-24  136.660004  136.660004  136.660004  135.279999  135.910004   21690900.0  246.705168  247.350006
2017-02-27  136.929993  136.929993  137.440002  136.279999  137.139999   20196400.0  248.680002  249.330002
2017-02-28  136.990005  136.990005  137.440002  136.699997  137.080002   23403500.0  248.059998  248.059998
2017-03-01  139.789993  139.789993  140.149994  137.600006  137.889999   36272400.0  252.710007  252.710007
2017-03-02  138.960007  138.960007  140.279999  138.759995  140.000000   26153300.0  251.059998  251.059998

答案 1 :(得分:1)

这将依次下载每组股票定价数据,并将pd.concat()数据下载到单个pandas.DataFrame

<强>代码:

stocks = ['AAPL', 'GS']
data = None
for stock_name in stocks:
    # fetch the price data
    stock_data = data.DataReader(stock_name, 'yahoo', '2004-01-01')

    # remove the closing price
    del stock_data['Close']

    # rename Adjusted Close to Close
    stock_data.rename(columns={'Adj Close': 'Close'}, inplace=True)

    # Add a multi index for the stock name
    stock_data.columns = pd.MultiIndex.from_product(
        [[stock_name], stock_data.columns])

    # concat this stock to the previous stocks
    if data is None:
        data = stock_data
    else:
        data = pd.concat([data, stock_data], axis=1)

<强>结果:

                  AAPL                                                 \
                  Open        High         Low     Volume       Close   
Date                                                                    
2004-01-02   21.549999   21.750000   21.180001   36160600    1.378514   
2004-01-05   21.420000   22.390000   21.420000   98754600    1.436168   
2004-01-06   22.250000   22.420001   21.710000  127337000    1.430985   
...                ...         ...         ...        ...         ...   
2017-02-28  137.080002  137.440002  136.699997   23403500  136.990005   
2017-03-01  137.889999  140.149994  137.600006   36272400  139.789993   
2017-03-02  140.000000  140.279999  138.759995   26153300  138.960007   

                    GS                                               
                  Open        High         Low   Volume       Close  
Date                                                                 
2004-01-02   98.800003   99.089996   96.580002  3042300   83.582711  
2004-01-05   97.300003   97.940002   96.150002  4817700   83.625740  
2004-01-06   97.360001   97.669998   96.379997  4077800   83.126634  
...                ...         ...         ...      ...         ...  
2017-02-28  248.000000  249.000000  245.610001  3627100  248.059998  
2017-03-01  253.710007  255.149994  251.259995  5218300  252.710007  
2017-03-02  253.520004  254.240005  250.970001  3014300  251.059998