# Price
0 1.00
1 12.23
2 3.24
3 12.67
6 149.98
7 19.98
8 1883.23
9 1.99
10 4.89
11 9.99
12 12.99
13 18.23
14 17.99
15 18.98
16 18.11
17 19.10
18 20.30
19 1901.30
20 20.27k
假设我有以前的数据帧。我想添加两列mean_a
和mean_b
。 mean_a
将计算下一个k
级别的平均值,mean_b
将计算之前k
级别的平均值。例如,#10
k=3
,mean_a = (4.89 + 9.99 + 12.99)/3 = 9.29
和mean_b = (4.89 + 1.99 + 1883.23)/3 = 630.0366667
。我怎样才能在python中实现它?
我已经尝试过,但我认为这不是很好
def moving_average(self, df, col_name='smooth_midprice', k=10):
ma_cols = []
mb_cols = []
temp_df = pd.DataFrame()
for i in range(0, k+1):
ma_col = 'M_A_{}'.format(i)
ma_cols.append(ma_col)
mb_col = 'M_B_{}'.format(i)
mb_cols.append(mb_col)
temp_df[ma_col] = df[col_name].shift(i)
temp_df[mb_col] = df[col_name].shift(-i)
df['M_A'] = temp_df[ma_cols].mean(axis=1, skipna=True, numeric_only=True)
df['M_B'] = temp_df[mb_cols].mean(axis=1, skipna=True, numeric_only=True)
return df
答案 0 :(得分:3)
You can just using rolling
(注意.iloc
是颠倒df的顺序)
df['mean_a'] = df.Price.rolling(3,min_periods =1).mean()
df['mean_b'] = df.Price.iloc[::-1].rolling(3,min_periods =1).mean()
df
Out[9]:
Price mean_a mean_b
0 1.00 1.000000 5.490000
1 12.23 6.615000 9.380000
2 3.24 5.490000 55.296667
3 12.67 9.380000 60.876667
6 149.98 55.296667 684.396667
7 19.98 60.876667 635.066667
8 1883.23 684.396667 630.036667
9 1.99 635.066667 5.623333
10 4.89 630.036667 9.290000
11 9.99 5.623333 13.736667
12 12.99 9.290000 16.403333
13 18.23 13.736667 18.400000
14 17.99 16.403333 18.360000
15 18.98 18.400000 18.730000
16 18.11 18.360000 19.170000
17 19.10 18.730000 646.900000
18 20.30 19.170000 647.290000
19 1901.30 646.900000 960.785000
20 20.27 647.290000 20.270000
修复您的代码
col_name='Price'
k=10
ma_cols = []
mb_cols = []
temp_df = pd.DataFrame()
for i in range(0, k + 1):
ma_col = 'M_A_{}'.format(i)
ma_cols.append(ma_col)
mb_col = 'M_B_{}'.format(i)
mb_cols.append(mb_col)
temp_df[ma_col] = df[col_name].shift(i)
temp_df[mb_col] = df[col_name].shift(-i)
df['M_A'] = temp_df[ma_cols].stack().groupby(level=0).head(3).mean(level=0)#change 3 to k
df['M_B'] = temp_df[mb_cols].stack().groupby(level=0).head(3).mean(level=0)
df
Out[35]:
Price mean_a mean_b M_A M_B
0 1.00 1.000000 5.490000 1.000000 5.490000
1 12.23 6.615000 9.380000 6.615000 9.380000
2 3.24 5.490000 55.296667 5.490000 55.296667
3 12.67 9.380000 60.876667 9.380000 60.876667
6 149.98 55.296667 684.396667 55.296667 684.396667
7 19.98 60.876667 635.066667 60.876667 635.066667
8 1883.23 684.396667 630.036667 684.396667 630.036667
9 1.99 635.066667 5.623333 635.066667 5.623333
10 4.89 630.036667 9.290000 630.036667 9.290000
11 9.99 5.623333 13.736667 5.623333 13.736667
12 12.99 9.290000 16.403333 9.290000 16.403333
13 18.23 13.736667 18.400000 13.736667 18.400000
14 17.99 16.403333 18.360000 16.403333 18.360000
15 18.98 18.400000 18.730000 18.400000 18.730000
16 18.11 18.360000 19.170000 18.360000 19.170000
17 19.10 18.730000 646.900000 18.730000 646.900000
18 20.30 19.170000 647.290000 19.170000 647.290000
19 1901.30 646.900000 960.785000 646.900000 960.785000
20 20.27 647.290000 20.270000 647.290000 20.270000
答案 1 :(得分:2)
就像@Wen所说:你可以使用滚动功能计算mean_a
:
df['mean_a'] = df['Price'].rolling(3).mean()
df['mean_b']
仅df['mean_a']
移位-2
df['mean_b'] = df['mean_a'].shift(-2)
返回:
# Price mean_a mean_b
0 0 1.00 NaN 5.490000
1 1 12.23 NaN 9.380000
2 2 3.24 5.490000 55.296667
3 3 12.67 9.380000 60.876667
4 6 149.98 55.296667 684.396667
5 7 19.98 60.876667 635.066667
6 8 1883.23 684.396667 630.036667
7 9 1.99 635.066667 5.623333
8 10 4.89 630.036667 9.290000
9 11 9.99 5.623333 13.736667
10 12 12.99 9.290000 16.403333
11 13 18.23 13.736667 18.400000
12 14 17.99 16.403333 18.360000
13 15 18.98 18.400000 18.730000
14 16 18.11 18.360000 19.170000
15 17 19.10 18.730000 646.900000
16 18 20.30 19.170000 7397.200000
17 19 1901.30 646.900000 NaN
18 20 20270.00 7397.200000 NaN
修改强>
如果要避免某些值为NA,则需要使用min_periods
参数。我们可以通过mean_a
创建df['mean_a'] = df['Price'].rolling(3, min_periods = 1).mean()
,但现在我们无法通过移动创建mean_b
- 除了@Wen的方法之外,我想不出另一种直截了当的方式。 (颠倒price
为df['mean_b']
的小节的na
系列:
df['mean_b'] = df['mean_a'].shift(-2)
df['mean_b'][df['mean_b'].isna()] = df['Price']df['mean_b'].isna()].iloc[::-1].rolling(3,min_periods =1).mean()
但如果我们首先推翻整个系列,那么它可能会少参与其中。
答案 2 :(得分:1)
def moving_average(df, k=10):
mean_a = pd.Series()
mean_b = pd.Series()
for i in range(df.shape[0]):
mean_a = mean_a.append(df.iloc[i:i+k].mean(), ignore_index=True)
start_b = i-k+1 if i-k+1>=0 else 0
mean_b = mean_b.append(df.iloc[start_b:i+1].mean(), ignore_index=True)
hold = df.copy()
hold["mean_a"] = mean_a
hold["mean_b"] = mean_b
return hold