重新采样包含已用时间值的pandas时间序列

时间:2014-03-05 12:54:37

标签: python pandas

我有时间序列数据,格式如本帖子底部所示。

我想将数据重新采样到30分钟的时间间隔,但我需要将状态时间值相应地分割为正确的间隔(这些值以整秒表示)。

现在假设某一行的状态是2342秒(超过30分钟),并说开始时间是08:22:00。

User    Start Date  Start Time  State   Time in State (secs)
J.Doe   03-02-2014  08:22:00    A       2342

当重新抽样完成后,我需要将状态时间相应地分成它溢出的时间段,如下所示:

User    Start Date  Time Period State   Time in State (secs)
J.Doe   03-02-2014  08:00:00    A       480
J.Doe   03-02-2014  08:30:00    A       1800
J.Doe   03-02-2014  09:00:00    A       62

480 + 1800 + 62 = 2342

我完全失去了如何在熊猫中实现这一目标......我将不胜感激任何帮助: - )

源数据格式:

User    Start Date  Start Time  State   Time in State (secs)
J.Doe   03-02-2014  07:58:00    A       36
J.Doe   03-02-2014  07:59:00    A       43
J.Doe   03-02-2014  08:00:00    A       59
J.Doe   03-02-2014  08:01:00    A       32
J.Doe   03-02-2014  08:21:00    A       15
J.Doe   03-02-2014  08:22:00    B       3
J.Doe   03-02-2014  08:22:00    A       2342
J.Doe   03-02-2014  09:01:00    B       1
J.Doe   03-02-2014  09:01:00    A       375
J.Doe   03-02-2014  09:07:00    B       3
J.Doe   03-02-2014  09:07:00    A       6408
J.Doe   03-02-2014  10:54:00    B       2
J.Doe   03-02-2014  10:54:00    A       116
J.Doe   03-02-2014  10:58:00    B       2
J.Doe   03-02-2014  10:58:00    A       122
J.Doe   03-02-2014  10:58:00    A       12
J.Doe   03-02-2014  11:00:00    B       2
J.Doe   03-02-2014  11:00:00    A       3417
J.Doe   03-02-2014  11:57:00    B       3
J.Doe   03-02-2014  11:57:00    A       120
J.Doe   03-02-2014  11:59:00    C       165
J.Doe   03-02-2014  12:02:00    B       3
J.Doe   03-02-2014  12:02:00    A       7254

1 个答案:

答案 0 :(得分:1)

我首先创建Start和End列(作为datetime64对象):

In [11]: df['Start'] = pd.to_datetime(df['Start Date'] + ' ' + df['Start Time'])

In [12]: df['End'] = df['Start'] + df['Time in State (secs)'].apply(pd.offsets.Second)

In [13]: row = df.iloc[6, :]

In [14]: row
Out[14]: 
User                                  J.Doe
Start Date                       03-02-2014
Start Time                         08:22:00
State                                     A
Time in State (secs)                   2342
Start                   2014-03-02 08:22:00
End                     2014-03-02 09:01:02
Name: 6, dtype: object

获得分割时间的一种方法是从开始和结束重新取样,合并,并使用diff:

def split_times(row):
    y = pd.Series(0, [row['Start'], row['End']])
    splits = y.resample('30min').index + y.index  # this fills in middle and sorts too
    res = -splits.to_series().diff(-1)
    if len(res) > 2: res = res[1:-1]
    elif len(res) == 2: res = res[1:] 
    return res.astype(int).resample('30min').astype(np.timedelta64)  # hack to resample again

In [16]: split_times(row)
Out[16]: 
2014-03-02 08:22:00   00:08:00
2014-03-02 08:30:00   00:30:00
2014-03-02 09:00:00   00:01:02
dtype: timedelta64[ns]

In [17]: df.apply(split_times, 1)
Out[17]: 
    2014-03-02 07:30:00  2014-03-02 08:00:00  2014-03-02 08:30:00  2014-03-02 09:00:00  2014-03-02 09:30:00  2014-03-02 10:00:00  2014-03-02 10:30:00  2014-03-02 11:00:00  2014-03-02 11:30:00  2014-03-02 12:00:00  2014-03-02 12:30:00  2014-03-02 13:00:00  2014-03-02 13:30:00  2014-03-02 14:00:00
0              00:00:36                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
1              00:00:43                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
2                   NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
3                   NaT             00:00:32                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
4                   NaT             00:00:15                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
5                   NaT             00:00:03                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
6                   NaT             00:08:00             00:30:00             00:01:02                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
7                   NaT                  NaT                  NaT             00:00:01                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
8                   NaT                  NaT                  NaT             00:06:15                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
9                   NaT                  NaT                  NaT             00:00:03                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
10                  NaT                  NaT                  NaT             00:23:00             00:30:00             00:30:00             00:23:48                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
11                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:00:02                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
12                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:01:56                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
13                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:00:02                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
14                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:02:00             00:00:02                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
15                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:00:12                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
16                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT
17                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:26:57                  NaT                  NaT                  NaT                  NaT                  NaT
18                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:00:03                  NaT                  NaT                  NaT                  NaT                  NaT
19                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:02:00                  NaT                  NaT                  NaT                  NaT                  NaT
20                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:01:00             00:01:45                  NaT                  NaT                  NaT                  NaT
21                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:00:03                  NaT                  NaT                  NaT                  NaT
22                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT                  NaT             00:28:00             00:30:00             00:30:00             00:30:00             00:02:54

要用0替换NaT,看起来你必须在0.13.1中做一些摆弄(这可能已在master中修复,否则是一个bug):

res2 = df.apply(split_times, 1).astype(int)
# hack to replace NaTs with 0
res2.where(res2 != -9223372036854775808, 0).astype(np.timedelta64)
# to just get the seconds
seconds = res2.where(res2 != -9223372036854775808, 0) / 10 ** 9