我想同时获得多个滚动周期和多个列的std。
这是我用于滚动的代码(5):
def add_mean_std_cols(df):
res = df.rolling(5).agg(['mean','std'])
res.columns = res.columns.map('_'.join)
cols = np.concatenate(list(zip(df.columns, res.columns[0::2], res.columns[1::2])))
final = res.join(df).loc[:, cols]
return final
我想在同一操作上滚动(5),(15),(30),(45)个句号。
我想过迭代一段时间,但不知道如何避免滚动平均值/ std滚动均值/标准...
答案 0 :(得分:1)
我建议使用MultiIndex作为列来创建DataFrame。没有办法在这里使用循环迭代你的窗口。生成的表单将易于索引,并且易于使用pd.read_csv
阅读。使用适当形状的np.empty
初始化一个空数据框,并使用.loc
指定其值。
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])
df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)
for window in windows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).values
现在,您的结果df2
与原始对象具有相同的索引。它有3个列级别:第一个是窗口,第二个是原始帧的列,第三个是统计信息。
print(df2.shape)
(100, 24)
这样可以轻松检查特定滚动窗口的值:
print(df2[5]) # Rolling window = 5
feature col0 col1 col2
metric mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 -0.87879 1.45348 -0.26559 0.71236 0.53233 0.89430
.. ... ... ... ... ... ...
95 -0.44231 1.02552 -1.22138 0.45140 -0.36440 0.95324
96 -0.58638 1.10246 -0.90165 0.79723 -0.44543 1.00166
97 -0.70564 0.85711 -0.42644 1.07174 -0.44766 1.00284
98 -0.95702 1.01302 -0.03705 1.05066 0.16437 1.32341
99 -0.57026 1.10978 0.08730 1.02438 0.39930 1.31240
print(df2[5]['col0']) # Rolling window = 5, stats of col0 only
metric mean std
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 -0.87879 1.45348
.. ... ...
95 -0.44231 1.02552
96 -0.58638 1.10246
97 -0.70564 0.85711
98 -0.95702 1.01302
99 -0.57026 1.10978
print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,
# means of each column
period 5
feature col0 col1 col2
metric mean mean mean
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 -0.87879 -0.26559 0.53233
.. ... ... ...
95 -0.44231 -1.22138 -0.36440
96 -0.58638 -0.90165 -0.44543
97 -0.70564 -0.42644 -0.44766
98 -0.95702 -0.03705 0.16437
99 -0.57026 0.08730 0.39930
最后要制作一个单索引的DataFrame,这里使用了itertools
。
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
import itertools
means = [col + '_mean' for col in df.columns]
stds = [col + '_std' for col in df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() for it in itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) for win in windows]))
iters = ['_'.join(it) for it in iters]
df2 = [df.rolling(window=window).agg(stats).values for window in windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
index=df.index)
答案 1 :(得分:0)
您可以连接多个滚动聚合的输出:
windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i) # 1. Create window
.agg(['mean', 'std']) # 1. Aggregate
.rename_axis({col: '{0}_{1:d}'.format(col, i)
for col in df.columns}, axis=1) # 2. Rename columns
for i in windows) # For each window
pd.concat((df, *rolling_dfs), axis=1) # 3. Concatenate dataframes
这不是很好,但应该按照我的理解去做你正在寻找的东西。
它的作用:
rolling_dfs
,其中包含每个滚动窗口大小的聚合数据框。df
与滚动窗口连接起来。