通过Groupby将函数应用于MultiIndex DataFrame

时间:2020-02-16 11:50:10

标签: python pandas group-by

我想对该multiIndex Dataframe进行groupby('Ticker'),然后应用一个函数,该函数为每个股票返回一个Series,并将结果添加到df的新列中。

def Indicator(dataf):

    df = dataf.copy()
    df['TR1'] = df.High.sub(df.Low)
    df['TR2'] = abs(df.High.sub(df.Close.shift(1)))
    df['TR3'] = abs(df.Low.sub(df.Close.shift(1)))
    df['TR'] = df[['TR1', 'TR2', 'TR3']].max(axis=1)
    df['TR_mean'] = df['TR'].resample('M').mean().shift(1).resample('D').fillna('bfill')
    df['Vol_mean'] = df['Volume'].resample('M').mean().shift(1).resample('D').fillna('bfill')
    indicator = (df.TR.div(df.TR_mean)).div(df.Volume.div(df.Vol_mean))

    return indicator

我尝试这样的事情:

tickers.groupby('Ticker').apply(Indicator)

但是我得到这个错误: 仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但具有“ MultiIndex”的实例

数据框:

                        Close           High         Low               Open         Volume
Date        Ticker                  
2010-01-04  AAPL        6048.299805    6048.299805  5974.430176 5975.520020 1.043444e+08
            GOOG        1132.989990    1133.869995  1116.560059 1116.560059 3.991400e+09
            TSM         10654.79003    10694.49023  10608.13948 10609.33984 1.044000e+05
2010-01-05  AAPL        6031.859863    6058.020020  6015.669922 6043.939941 1.175721e+08
            GOOG        1132.989990    1133.869995  1116.560059 1116.560059 3.991400e+09
            TSM         10654.79003    10694.49023  10608.13948 10609.33984 1.044000e+05

1 个答案:

答案 0 :(得分:1)

为了解决该错误,您只需在Indicator操作之后在copy函数中添加以下行:

df.index = df.index.get_level_values(0)

问题确实是由于您将MultiIndex而不是DateTime索引传递给函数(该函数可与时间序列一起使用)中的resample方法。多余的行基本上是将MultiIndex替换为索引的DateTime部分。结果如下:

>>> df_orig
                          Close          High           Low          Open        Volume
Date       Ticker
2010-01-04 AAPL     6048.299805   6048.299805   5974.430176   5975.520020  1.043444e+08
           GOOGL    1132.989990   1133.869995   1116.560059   1116.560059  3.991400e+09
           TSM     10654.790030  10694.490230  10608.139480  10609.339840  1.044000e+05
2010-01-05 AAPL     6031.859863   6058.020020   6015.669922   6043.939941  1.175721e+08
           GOOGL    1132.989990   1133.869995   1116.560059   1116.560059  3.991400e+09
           TSM     10654.790030  10694.490230  10608.139480  10609.339840  1.044000e+05

>>> df_orig.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6 entries, (2010-01-04 00:00:00, AAPL) to (2010-01-05 00:00:00, TSM)
Data columns (total 5 columns):
Close     6 non-null float64
High      6 non-null float64
Low       6 non-null float64
Open      6 non-null float64
Volume    6 non-null float64
dtypes: float64(5)
memory usage: 410.0+ bytes

>>> df_orig.groupby("Ticker").apply(Indicator)
Date    2010-01-04  2010-01-05
Ticker
AAPL           NaN         NaN
GOOGL          NaN         NaN
TSM            NaN         NaN

当然,您也可以像这样在groupby-apply部分之前删除Ticker列:

ticker_idx = df_orig.index.get_level_values(1)
df_orig.reset_index(1, drop=True).groupby(ticker_idx).apply(Indicator)

这样,您无需在函数中添加多余的行。

此外,通过groupby-apply操作,我得到了一堆NaN,但是通过查看您函数的代码,我认为这是由于该函数期望2天后有更多数据的事实。让我知道这是否正确。