我希望添加两列[ std_dev ,意思],其中平均值的样本会随着特定位置的日期继续而展开。
location date temp std_dev mean
NY 2014-02-01 60
NY 2014-02-02 55
NY 2014-02-03 70
NY 2014-02-04 80
LA 2014-02-01 80
LA 2014-02-02 85
LA 2014-02-03 75
我发现了一篇解释滚动均值/标准的帖子,我可以将它应用到表格中。但是我收到 std_dev 的错误,因为该位置的大小不是固定值。如何在不修复的情况下引用窗口大小?
pandas rolling on a shifted dataframe
df['mean'] = df.groupby('location')['temp'].apply(pd.rolling_mean,4,min_periods=2).shift(1)
df['std_dev'] = df.groupby('location')['temp'].apply(pd.rolling_std,4,min_periods=2).shift(1)
任何帮助都非常感谢!
答案 0 :(得分:2)
我认为您正在寻找expanding
,例如
>>> df
temp location
0 60 NY
1 55 NY
2 70 NY
3 80 NY
4 80 LA
5 85 LA
6 75 LA
>>> expander = df.groupby('location').temp.expanding(min_periods=2)
>>> orderify = lambda x: x.reset_index(level=0, drop=True).sort_index()
>>> df['mean'], df['std'] = map(orderify, [expander.mean(), expander.std()])
>>> df
location temp mean std
0 NY 60 NaN NaN
1 NY 55 57.500000 3.535534
2 NY 70 61.666667 7.637626
3 NY 80 66.250000 11.086779
4 LA 80 NaN NaN
5 LA 85 82.500000 3.535534
6 LA 75 80.000000 5.000000
注意:最好在.agg
上使用expander
,但从版本0.19.2开始,agg
不复杂groupby.rolling
可在groupby.expanding
或contains_one = True
guess = '1'
while '1' in guess:
print('No ones are allowed')
guess = input('...')
上使用,因此无法使用。参见