我想计算数据帧的滚动平均值,滚动最大值,滚动最小值,也要考虑观察ID,并将其存储在新列中。我的原始数据框有很多列,因此无法手动重命名该列。 这是一个示例:
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import random
np.random.seed(11)
date_today = datetime.now()
ndays = 10
df = pd.DataFrame({'date': [date_today + timedelta(days=x) for x in range(ndays)],
'A': pd.Series(np.random.randn(ndays)), 'B':pd.Series(np.random.randn(ndays))})
df = df.set_index('date')
df = df.mask(np.random.random(df.shape) < .6)
df['ID']=1
df.ID[6:10]=2
df
目前,我正在以这种形式手动进行此过程:
df[['A_mean','B_mean']]=df.groupby('ID').apply(pd.rolling_mean, 2, min_periods=1).drop('ID',axis=1)
df[['A_max','B_max']]=df.groupby('ID').apply(pd.rolling_max, 2, min_periods=1).drop('ID',axis=1)
有什么方法可以做到这一点,而无需手动存储新列。
答案 0 :(得分:2)
一种方法
d = pd.concat({
df.index[i]: df[['A', 'B']].iloc[max(0, i - 1):i + 1].agg(
['min', 'max', 'count', 'mean', 'sum', 'std']).rename(str.title)
for i in range(1, len(df))
})
d.unstack()
A B
Min Max Count Mean Sum Std Min Max Count Mean Sum Std
2018-06-21 14:32:32.964218 -0.286073 -0.286073 1.0 -0.286073 -0.286073 NaN -0.886240 -0.475733 2.0 -0.680987 -1.361973 0.290272
2018-06-22 14:32:32.964218 -0.286073 -0.286073 1.0 -0.286073 -0.286073 NaN -0.475733 0.689682 2.0 0.106974 0.213949 0.824073
2018-06-23 14:32:32.964218 -2.653319 -2.653319 1.0 -2.653319 -2.653319 NaN 0.689682 0.689682 1.0 0.689682 0.689682 NaN
2018-06-24 14:32:32.964218 -2.653319 -2.653319 1.0 -2.653319 -2.653319 NaN -1.305549 -1.305549 1.0 -1.305549 -1.305549 NaN
2018-06-25 14:32:32.964218 NaN NaN 0.0 NaN 0.000000 NaN -1.305549 -1.305549 1.0 -1.305549 -1.305549 NaN
2018-06-26 14:32:32.964218 NaN NaN 0.0 NaN 0.000000 NaN NaN NaN 0.0 NaN 0.000000 NaN
2018-06-27 14:32:32.964218 NaN NaN 0.0 NaN 0.000000 NaN NaN NaN 0.0 NaN 0.000000 NaN
2018-06-28 14:32:32.964218 0.421051 0.421051 1.0 0.421051 0.421051 NaN -0.031075 -0.031075 1.0 -0.031075 -0.031075 NaN
2018-06-29 14:32:32.964218 -1.065603 0.421051 2.0 -0.322276 -0.644552 1.051223 -0.031075 -0.031075 1.0 -0.031075 -0.031075 NaN