添加两列,即e。 mean_a和mean_b

时间:2018-06-10 16:22:47

标签: python pandas dataframe

# Price
0  1.00
1  12.23
2  3.24
3  12.67
6  149.98
7  19.98
8  1883.23
9  1.99
10 4.89
11 9.99
12 12.99
13 18.23
14 17.99
15 18.98
16 18.11
17 19.10
18 20.30
19 1901.30
20 20.27k

假设我有以前的数据帧。我想添加两列mean_amean_bmean_a将计算下一个k级别的平均值,mean_b将计算之前k级别的平均值。例如,#10 k=3mean_a = (4.89 + 9.99 + 12.99)/3 = 9.29mean_b = (4.89 + 1.99 + 1883.23)/3 = 630.0366667。我怎样才能在python中实现它?

我已经尝试过,但我认为这不是很好

def moving_average(self, df, col_name='smooth_midprice', k=10):
        ma_cols = []
        mb_cols = []
        temp_df = pd.DataFrame()

        for i in range(0, k+1):
            ma_col = 'M_A_{}'.format(i)
            ma_cols.append(ma_col)
            mb_col = 'M_B_{}'.format(i)
            mb_cols.append(mb_col)
            temp_df[ma_col] = df[col_name].shift(i)
            temp_df[mb_col] = df[col_name].shift(-i)


        df['M_A'] = temp_df[ma_cols].mean(axis=1, skipna=True, numeric_only=True)
        df['M_B'] = temp_df[mb_cols].mean(axis=1, skipna=True, numeric_only=True)
        return df

3 个答案:

答案 0 :(得分:3)

You can just using rolling(注意.iloc是颠倒df的顺序)

df['mean_a'] = df.Price.rolling(3,min_periods =1).mean()
df['mean_b'] = df.Price.iloc[::-1].rolling(3,min_periods =1).mean()
df
Out[9]: 
      Price      mean_a      mean_b
0      1.00    1.000000    5.490000
1     12.23    6.615000    9.380000
2      3.24    5.490000   55.296667
3     12.67    9.380000   60.876667
6    149.98   55.296667  684.396667
7     19.98   60.876667  635.066667
8   1883.23  684.396667  630.036667
9      1.99  635.066667    5.623333
10     4.89  630.036667    9.290000
11     9.99    5.623333   13.736667
12    12.99    9.290000   16.403333
13    18.23   13.736667   18.400000
14    17.99   16.403333   18.360000
15    18.98   18.400000   18.730000
16    18.11   18.360000   19.170000
17    19.10   18.730000  646.900000
18    20.30   19.170000  647.290000
19  1901.30  646.900000  960.785000
20    20.27  647.290000   20.270000

修复您的代码

col_name='Price'

k=10
ma_cols = []
mb_cols = []
temp_df = pd.DataFrame()

for i in range(0, k + 1):
    ma_col = 'M_A_{}'.format(i)
    ma_cols.append(ma_col)
    mb_col = 'M_B_{}'.format(i)
    mb_cols.append(mb_col)
    temp_df[ma_col] = df[col_name].shift(i)
    temp_df[mb_col] = df[col_name].shift(-i)

df['M_A'] = temp_df[ma_cols].stack().groupby(level=0).head(3).mean(level=0)#change 3 to k 
df['M_B'] = temp_df[mb_cols].stack().groupby(level=0).head(3).mean(level=0)

df
Out[35]: 
      Price      mean_a      mean_b         M_A         M_B
0      1.00    1.000000    5.490000    1.000000    5.490000
1     12.23    6.615000    9.380000    6.615000    9.380000
2      3.24    5.490000   55.296667    5.490000   55.296667
3     12.67    9.380000   60.876667    9.380000   60.876667
6    149.98   55.296667  684.396667   55.296667  684.396667
7     19.98   60.876667  635.066667   60.876667  635.066667
8   1883.23  684.396667  630.036667  684.396667  630.036667
9      1.99  635.066667    5.623333  635.066667    5.623333
10     4.89  630.036667    9.290000  630.036667    9.290000
11     9.99    5.623333   13.736667    5.623333   13.736667
12    12.99    9.290000   16.403333    9.290000   16.403333
13    18.23   13.736667   18.400000   13.736667   18.400000
14    17.99   16.403333   18.360000   16.403333   18.360000
15    18.98   18.400000   18.730000   18.400000   18.730000
16    18.11   18.360000   19.170000   18.360000   19.170000
17    19.10   18.730000  646.900000   18.730000  646.900000
18    20.30   19.170000  647.290000   19.170000  647.290000
19  1901.30  646.900000  960.785000  646.900000  960.785000
20    20.27  647.290000   20.270000  647.290000   20.270000

答案 1 :(得分:2)

就像@Wen所说:你可以使用滚动功能计算mean_a

df['mean_a'] = df['Price'].rolling(3).mean()

df['mean_b']df['mean_a']移位-2

df['mean_b'] = df['mean_a'].shift(-2)

返回:

    #   Price   mean_a      mean_b
0   0   1.00    NaN         5.490000
1   1   12.23   NaN         9.380000
2   2   3.24    5.490000    55.296667
3   3   12.67   9.380000    60.876667
4   6   149.98  55.296667   684.396667
5   7   19.98   60.876667   635.066667
6   8   1883.23 684.396667  630.036667
7   9   1.99    635.066667  5.623333
8   10  4.89    630.036667  9.290000
9   11  9.99    5.623333    13.736667
10  12  12.99   9.290000    16.403333
11  13  18.23   13.736667   18.400000
12  14  17.99   16.403333   18.360000
13  15  18.98   18.400000   18.730000
14  16  18.11   18.360000   19.170000
15  17  19.10   18.730000   646.900000
16  18  20.30   19.170000   7397.200000
17  19  1901.30 646.900000   NaN
18  20  20270.00 7397.200000 NaN

修改

如果要避免某些值为NA,则需要使用min_periods参数。我们可以通过mean_a创建df['mean_a'] = df['Price'].rolling(3, min_periods = 1).mean(),但现在我们无法通过移动创建mean_b - 除了@Wen的方法之外,我想不出另一种直截了当的方式。 (颠倒pricedf['mean_b']的小节的na系列:

 df['mean_b'] = df['mean_a'].shift(-2)
 df['mean_b'][df['mean_b'].isna()] = df['Price']df['mean_b'].isna()].iloc[::-1].rolling(3,min_periods =1).mean()

但如果我们首先推翻整个系列,那么它可能会少参与其中。

答案 2 :(得分:1)

def moving_average(df, k=10):
    mean_a = pd.Series()
    mean_b = pd.Series()
    for i in range(df.shape[0]):
        mean_a = mean_a.append(df.iloc[i:i+k].mean(), ignore_index=True)
        start_b = i-k+1 if i-k+1>=0 else 0
        mean_b = mean_b.append(df.iloc[start_b:i+1].mean(), ignore_index=True)

    hold = df.copy()
    hold["mean_a"] = mean_a
    hold["mean_b"] = mean_b
    return hold