熊猫:向所有列的数据框添加滚动统计信息

时间:2018-06-20 21:31:07

标签: python pandas pandas-groupby

我想计算数据帧的滚动平均值,滚动最大值,滚动最小值,也要考虑观察ID,并将其存储在新列中。我的原始数据框有很多列,因此无法手动重命名该列。 这是一个示例:

from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import random
np.random.seed(11) 
date_today = datetime.now()
ndays = 10
df = pd.DataFrame({'date': [date_today + timedelta(days=x) for x in range(ndays)], 
               'A': pd.Series(np.random.randn(ndays)),     'B':pd.Series(np.random.randn(ndays))})
df = df.set_index('date')
df = df.mask(np.random.random(df.shape) < .6)

df['ID']=1
df.ID[6:10]=2
df

目前,我正在以这种形式手动进行此过程:

df[['A_mean','B_mean']]=df.groupby('ID').apply(pd.rolling_mean, 2, min_periods=1).drop('ID',axis=1)
df[['A_max','B_max']]=df.groupby('ID').apply(pd.rolling_max, 2, min_periods=1).drop('ID',axis=1)

有什么方法可以做到这一点,而无需手动存储新列。

1 个答案:

答案 0 :(得分:2)

一种方法

d = pd.concat({
    df.index[i]: df[['A', 'B']].iloc[max(0, i - 1):i + 1].agg(
        ['min', 'max', 'count', 'mean', 'sum', 'std']).rename(str.title)
    for i in range(1, len(df))
})

d.unstack()

                                   A                                                       B                                              
                                 Min       Max Count      Mean       Sum       Std       Min       Max Count      Mean       Sum       Std
2018-06-21 14:32:32.964218 -0.286073 -0.286073   1.0 -0.286073 -0.286073       NaN -0.886240 -0.475733   2.0 -0.680987 -1.361973  0.290272
2018-06-22 14:32:32.964218 -0.286073 -0.286073   1.0 -0.286073 -0.286073       NaN -0.475733  0.689682   2.0  0.106974  0.213949  0.824073
2018-06-23 14:32:32.964218 -2.653319 -2.653319   1.0 -2.653319 -2.653319       NaN  0.689682  0.689682   1.0  0.689682  0.689682       NaN
2018-06-24 14:32:32.964218 -2.653319 -2.653319   1.0 -2.653319 -2.653319       NaN -1.305549 -1.305549   1.0 -1.305549 -1.305549       NaN
2018-06-25 14:32:32.964218       NaN       NaN   0.0       NaN  0.000000       NaN -1.305549 -1.305549   1.0 -1.305549 -1.305549       NaN
2018-06-26 14:32:32.964218       NaN       NaN   0.0       NaN  0.000000       NaN       NaN       NaN   0.0       NaN  0.000000       NaN
2018-06-27 14:32:32.964218       NaN       NaN   0.0       NaN  0.000000       NaN       NaN       NaN   0.0       NaN  0.000000       NaN
2018-06-28 14:32:32.964218  0.421051  0.421051   1.0  0.421051  0.421051       NaN -0.031075 -0.031075   1.0 -0.031075 -0.031075       NaN
2018-06-29 14:32:32.964218 -1.065603  0.421051   2.0 -0.322276 -0.644552  1.051223 -0.031075 -0.031075   1.0 -0.031075 -0.031075       NaN