生成可索引的滑动窗口时间序列的优雅方法

时间:2018-12-15 11:15:43

标签: python pandas pytorch

要实现pytorch的{​​{1}}类DataSet方法,它需要支持索引,以便__get_item__()可用于获取dataset[i]样本。

说我有一个时间序列ith

ser

因为我需要索引到滚动窗口。我使用以下方法生成窗口长度2017-12-29 14:44:00 69.90 2017-12-29 14:45:00 69.91 2017-12-29 14:46:00 69.87 2017-12-29 14:47:00 69.85 2017-12-29 14:48:00 69.86 2017-12-29 14:49:00 69.92 2017-12-29 14:50:00 69.90 2017-12-29 14:51:00 70.00 2017-12-29 14:52:00 69.97 2017-12-29 14:53:00 69.99 2017-12-29 14:54:00 69.99 2017-12-29 14:55:00 69.85 时间序列:

3

l3_list = list() def t(x): l3_list.append(x.copy()) ser.rolling(3).apply(t) 变为:

l3_list

这样我就可以在l3_list中建立索引。即[array([69.9 , 69.91, 69.87]), array([69.91, 69.87, 69.85]), array([69.87, 69.85, 69.86]), array([69.85, 69.86, 69.92]), array([69.86, 69.92, 69.9 ]), array([69.92, 69.9 , 70. ]), array([69.9 , 70. , 69.97]), array([70. , 69.97, 69.99]), array([69.97, 69.99, 69.99]), array([69.99, 69.99, 69.85])] l3_list[i]滑动窗口。有没有更有效的内存方式来做到这一点?

2 个答案:

答案 0 :(得分:0)

您可以添加一列,有点像在这里完成:Pandas rolling window to return an array

from io import StringIO

data = """
2017-12-29 14:44:00  69.90
2017-12-29 14:45:00  69.91
2017-12-29 14:46:00  69.87
2017-12-29 14:47:00  69.85
2017-12-29 14:48:00  69.86
2017-12-29 14:49:00  69.92
2017-12-29 14:50:00  69.90
2017-12-29 14:51:00  70.00
2017-12-29 14:52:00  69.97
2017-12-29 14:53:00  69.99
2017-12-29 14:54:00  69.99
2017-12-29 14:55:00  69.85 
"""

df = pd.read_csv(StringIO(data), sep='\s+', header = None)

stride = np.lib.stride_tricks.as_strided  
arr = stride(df[2], (len(df), 3), (df[2].values.strides * 2))
df['array'] = pd.Series(arr.tolist(), index=df.index[:])

             0         1      2                         array
0   2017-12-29  14:44:00  69.90          [69.9, 69.91, 69.87]
1   2017-12-29  14:45:00  69.91         [69.91, 69.87, 69.85]
2   2017-12-29  14:46:00  69.87         [69.87, 69.85, 69.86]
3   2017-12-29  14:47:00  69.85         [69.85, 69.86, 69.92]
4   2017-12-29  14:48:00  69.86          [69.86, 69.92, 69.9]
5   2017-12-29  14:49:00  69.92           [69.92, 69.9, 70.0]
6   2017-12-29  14:50:00  69.90           [69.9, 70.0, 69.97]
7   2017-12-29  14:51:00  70.00          [70.0, 69.97, 69.99]
8   2017-12-29  14:52:00  69.97         [69.97, 69.99, 69.99]
9   2017-12-29  14:53:00  69.99         [69.99, 69.99, 69.85]
10  2017-12-29  14:54:00  69.99     [69.99, 69.85, 5.53e-322]
11  2017-12-29  14:55:00  69.85  [69.85, 5.53e-322, 5.6e-322]

答案 1 :(得分:0)

这是获取滑动窗口的另一个技巧:

设置:

d = {pd.Timestamp('2017-12-29 14:44:00'): 69.9,
 pd.Timestamp('2017-12-29 14:45:00'): 69.91,
 pd.Timestamp('2017-12-29 14:46:00'): 69.87,
 pd.Timestamp('2017-12-29 14:47:00'): 69.85,
 pd.Timestamp('2017-12-29 14:48:00'): 69.86,
 pd.Timestamp('2017-12-29 14:49:00'): 69.92,
 pd.Timestamp('2017-12-29 14:50:00'): 69.9,
 pd.Timestamp('2017-12-29 14:51:00'): 70.0,
 pd.Timestamp('2017-12-29 14:52:00'): 69.97,
 pd.Timestamp('2017-12-29 14:53:00'): 69.99,
 pd.Timestamp('2017-12-29 14:54:00'): 69.99,
 pd.Timestamp('2017-12-29 14:55:00'): 69.85}

ser = pd.Series(d)

将空列表与rolling一起使用,将applyappend一起使用:

lol = []
ser.rolling(3).apply((lambda x: lol.append(x.values) or 0), raw=False)
lol

输出:

[array([69.9 , 69.91, 69.87]),
 array([69.91, 69.87, 69.85]),
 array([69.87, 69.85, 69.86]),
 array([69.85, 69.86, 69.92]),
 array([69.86, 69.92, 69.9 ]),
 array([69.92, 69.9 , 70.  ]),
 array([69.9 , 70.  , 69.97]),
 array([70.  , 69.97, 69.99]),
 array([69.97, 69.99, 69.99]),
 array([69.99, 69.99, 69.85])]