熊猫:多个滚动期

时间:2017-09-10 18:50:37

标签: python python-2.7 pandas

我想同时获得多个滚动周期和多个列的std。

这是我用于滚动的代码(5):

def add_mean_std_cols(df):
    res = df.rolling(5).agg(['mean','std'])

    res.columns = res.columns.map('_'.join)

    cols = np.concatenate(list(zip(df.columns, res.columns[0::2], res.columns[1::2])))

    final = res.join(df).loc[:, cols]
    return final

我想在同一操作上滚动(5),(15),(30),(45)个句号。

我想过迭代一段时间,但不知道如何避免滚动平均值/ std滚动均值/标准...

2 个答案:

答案 0 :(得分:1)

我建议使用MultiIndex作为列来创建DataFrame。没有办法在这里使用循环迭代你的窗口。生成的表单将易于索引,并且易于使用pd.read_csv阅读。使用适当形状的np.empty初始化一个空数据框,并使用.loc指定其值。

import numpy as np
import pandas as pd
np.random.seed(123)

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats], 
                                  names=['window', 'feature', 'metric'])

df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
                   index=df.index)

for window in windows:
    df2.loc[:, window] = df.rolling(window=window).agg(stats).values

现在,您的结果df2与原始对象具有相同的索引。它有3个列级别:第一个是窗口,第二个是原始帧的列,第三个是统计信息。

print(df2.shape)
(100, 24)

这样可以轻松检查特定滚动窗口的值:

print(df2[5])  # Rolling window = 5
feature     col0              col1              col2         
metric      mean      std     mean      std     mean      std
0            NaN      NaN      NaN      NaN      NaN      NaN
1            NaN      NaN      NaN      NaN      NaN      NaN
2            NaN      NaN      NaN      NaN      NaN      NaN
3            NaN      NaN      NaN      NaN      NaN      NaN
4       -0.87879  1.45348 -0.26559  0.71236  0.53233  0.89430
..           ...      ...      ...      ...      ...      ...
95      -0.44231  1.02552 -1.22138  0.45140 -0.36440  0.95324
96      -0.58638  1.10246 -0.90165  0.79723 -0.44543  1.00166
97      -0.70564  0.85711 -0.42644  1.07174 -0.44766  1.00284
98      -0.95702  1.01302 -0.03705  1.05066  0.16437  1.32341
99      -0.57026  1.10978  0.08730  1.02438  0.39930  1.31240

print(df2[5]['col0'])  # Rolling window = 5, stats of col0 only
metric     mean      std
0           NaN      NaN
1           NaN      NaN
2           NaN      NaN
3           NaN      NaN
4      -0.87879  1.45348
..          ...      ...
95     -0.44231  1.02552
96     -0.58638  1.10246
97     -0.70564  0.85711
98     -0.95702  1.01302
99     -0.57026  1.10978

print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,
                                            # means of each column
period         5                  
feature     col0     col1     col2
metric      mean     mean     mean
0            NaN      NaN      NaN
1            NaN      NaN      NaN
2            NaN      NaN      NaN
3            NaN      NaN      NaN
4       -0.87879 -0.26559  0.53233
..           ...      ...      ...
95      -0.44231 -1.22138 -0.36440
96      -0.58638 -0.90165 -0.44543
97      -0.70564 -0.42644 -0.44766
98      -0.95702 -0.03705  0.16437
99      -0.57026  0.08730  0.39930

最后要制作一个单索引的DataFrame,这里使用了itertools

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

import itertools

means = [col + '_mean' for col in df.columns]
stds = [col + '_std' for col in df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() for it in itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) for win in windows]))
iters = ['_'.join(it) for it in iters]

df2 = [df.rolling(window=window).agg(stats).values for window in windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
                   index=df.index)

答案 1 :(得分:0)

您可以连接多个滚动聚合的输出:

windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i)                                    # 1. Create window
                 .agg(['mean', 'std'])                          # 1. Aggregate
                 .rename_axis({col: '{0}_{1:d}'.format(col, i)
                               for col in df.columns}, axis=1)  # 2. Rename columns
               for i in windows)                                # For each window

pd.concat((df, *rolling_dfs), axis=1)                           # 3. Concatenate dataframes

这不是很好,但应该按照我的理解去做你正在寻找的东西。

它的作用:

  1. 创建一个生成器rolling_dfs,其中包含每个滚动窗口大小的聚合数据框。
  2. 重命名所有列,以便您可以知道它引用的滚动窗口大小。
  3. 将原始df与滚动窗口连接起来。