我想对该multiIndex Dataframe进行groupby('Ticker'),然后应用一个函数,该函数为每个股票返回一个Series,并将结果添加到df的新列中。
def Indicator(dataf):
df = dataf.copy()
df['TR1'] = df.High.sub(df.Low)
df['TR2'] = abs(df.High.sub(df.Close.shift(1)))
df['TR3'] = abs(df.Low.sub(df.Close.shift(1)))
df['TR'] = df[['TR1', 'TR2', 'TR3']].max(axis=1)
df['TR_mean'] = df['TR'].resample('M').mean().shift(1).resample('D').fillna('bfill')
df['Vol_mean'] = df['Volume'].resample('M').mean().shift(1).resample('D').fillna('bfill')
indicator = (df.TR.div(df.TR_mean)).div(df.Volume.div(df.Vol_mean))
return indicator
我尝试这样的事情:
tickers.groupby('Ticker').apply(Indicator)
但是我得到这个错误: 仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但具有“ MultiIndex”的实例
数据框:
Close High Low Open Volume
Date Ticker
2010-01-04 AAPL 6048.299805 6048.299805 5974.430176 5975.520020 1.043444e+08
GOOG 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.79003 10694.49023 10608.13948 10609.33984 1.044000e+05
2010-01-05 AAPL 6031.859863 6058.020020 6015.669922 6043.939941 1.175721e+08
GOOG 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.79003 10694.49023 10608.13948 10609.33984 1.044000e+05
答案 0 :(得分:1)
为了解决该错误,您只需在Indicator
操作之后在copy
函数中添加以下行:
df.index = df.index.get_level_values(0)
问题确实是由于您将MultiIndex而不是DateTime索引传递给函数(该函数可与时间序列一起使用)中的resample
方法。多余的行基本上是将MultiIndex替换为索引的DateTime部分。结果如下:
>>> df_orig
Close High Low Open Volume
Date Ticker
2010-01-04 AAPL 6048.299805 6048.299805 5974.430176 5975.520020 1.043444e+08
GOOGL 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.790030 10694.490230 10608.139480 10609.339840 1.044000e+05
2010-01-05 AAPL 6031.859863 6058.020020 6015.669922 6043.939941 1.175721e+08
GOOGL 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.790030 10694.490230 10608.139480 10609.339840 1.044000e+05
>>> df_orig.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6 entries, (2010-01-04 00:00:00, AAPL) to (2010-01-05 00:00:00, TSM)
Data columns (total 5 columns):
Close 6 non-null float64
High 6 non-null float64
Low 6 non-null float64
Open 6 non-null float64
Volume 6 non-null float64
dtypes: float64(5)
memory usage: 410.0+ bytes
>>> df_orig.groupby("Ticker").apply(Indicator)
Date 2010-01-04 2010-01-05
Ticker
AAPL NaN NaN
GOOGL NaN NaN
TSM NaN NaN
当然,您也可以像这样在groupby-apply部分之前删除Ticker
列:
ticker_idx = df_orig.index.get_level_values(1)
df_orig.reset_index(1, drop=True).groupby(ticker_idx).apply(Indicator)
这样,您无需在函数中添加多余的行。
此外,通过groupby-apply操作,我得到了一堆NaN
,但是通过查看您函数的代码,我认为这是由于该函数期望2天后有更多数据的事实。让我知道这是否正确。