具有重叠间隔时间序列的groupby

时间:2016-08-10 21:01:50

标签: python pandas dataframe group-by time-series

我在python pandas dataframe对象中有一个时间序列,我想创建一个基于索引的组,但我想要重叠的组,即组不是不同的。 header_sec是索引列。 每组包括一个2秒的窗口。  输入dataFrame

    header_sec
1  17004 days 22:17:13 
2  17004 days 22:17:13 
3  17004 days 22:17:13 
4  17004 days 22:17:13 
5  17004 days 22:17:14
6  17004 days 22:17:14
7  17004 days 22:17:14
8  17004 days 22:17:14
9  17004 days 22:17:15
10 17004 days 22:17:15
11 17004 days 22:17:15
12 17004 days 22:17:15
13 17004 days 22:17:16
14 17004 days 22:17:16
15 17004 days 22:17:16
16 17004 days 22:17:16
17 17004 days 22:17:17
18 17004 days 22:17:17
19 17004 days 22:17:17
20 17004 days 22:17:17

我的第一组应该

1  17004 days 22:17:13 
2  17004 days 22:17:13 
3  17004 days 22:17:13 
4  17004 days 22:17:13 
5  17004 days 22:17:14
6  17004 days 22:17:14
7  17004 days 22:17:14
8  17004 days 22:17:14

第二组从上一个索引开始,占据前一秒的1/2记录。

7  17004 days 22:17:14
8  17004 days 22:17:14
9  17004 days 22:17:15
10 17004 days 22:17:15
11 17004 days 22:17:15
12 17004 days 22:17:15
13 17004 days 22:17:16
14 17004 days 22:17:16

第三组.....

13 17004 days 22:17:16
14 17004 days 22:17:16
15 17004 days 22:17:16
16 17004 days 22:17:16
17 17004 days 22:17:17
18 17004 days 22:17:17
19 17004 days 22:17:17
20 17004 days 22:17:17

如果我在索引上进行分组,

  dfgroup=df.groupby(df.index)

这给了每秒一组。合并这些组的最佳方法是什么?

1 个答案:

答案 0 :(得分:1)

这是一种技术:

import numpy as np # if you have not already done this

grouped = df.groupby(df.index)

for name, group in grouped:
    try:
        prev_sec = df.loc[(name - pd.to_timedelta(1, unit='s')), :]
    except KeyError:
        prev_sec = pd.DataFrame(columns=group.columns)
    try:
        next_sec = df.loc[(name + pd.to_timedelta(1, unit='s')), :]
    except KeyError:
        next_sec = pd.DataFrame(columns=group.columns)
    Pn = 2 # replace this with int(len(prev_sec)/2) to get half rows from previous second
    Nn = 2 # replace this with int(len(next_sec)/2) to get half rows from next second
    group = pd.concat([prev_sec.iloc[-Pn:,:], group, next_sec.iloc[:Nn,:]])

    # Replace the below lines with your operations
    print(name, group)