熊猫时间滚动总和

时间:2018-11-07 16:42:56

标签: python pandas timedelta

我试图在一个像这样的数据帧中获得时间值的滚动总和:

    RunTime
0  00:51:25
1       NaT
2  00:42:16
3       NaT
4  00:40:15
5       NaT
6  00:50:13
7  00:53:28
8       NaT
9  00:37:32
10      NaT
11 01:53:22
12 01:08:22
13 00:59:57
14 00:12:22

预期输出:

     RunTime  RunTime_MS
0   00:51:25    
1   NaT         
2   00:42:16    
3   NaT         
4   00:40:15    
5   NaT         
6   00:50:13    3:04:09
7   00:53:28    3:06:12
8   NaT         3:06:12
9   00:37:32    3:01:28
10  NaT         3:01:28
11  01:53:22    4:14:35
12  01:08:22    5:22:57
13  00:59:57    5:32:41
14  00:12:22    4:51:35

对于我正在使用的数据框中的其他列(包含浮点数)

dfExt['Distance_MS'] = dfExt['Distance'].fillna(value=0).rolling(window=7).sum()

这很好用。 如果我尝试在时间列上执行此操作,则会收到错误

  

此dtype timedelta64 [ns]的滚动操作未实现

尽管the documentation似乎表明.sum()是您可以在timedelta上执行的操作。

这是示例代码:

import pandas as pd
from datetime import datetime, timedelta

RunTimeValues = ['00:51:25','','00:42:16','','00:40:15','','00:50:13','00:53:28','','00:37:32','','01:53:22','01:08:22','00:59:57','00:12:22']
for i in range(len(RunTimeValues)):
    if RunTimeValues[i] != '':
        #RunTimeValues[i] = datetime.strptime(RunTimeValues[i], "%H:%M:%S")
        t = datetime.strptime(RunTimeValues[i],"%H:%M:%S")
        RunTimeValues[i] = timedelta(hours=t.hour, minutes=t.minute, seconds=t.second)
dfExt = pd.DataFrame({'RunTime': RunTimeValues})
dfExt['RunTime_MS'] = dfExt['RunTime'].fillna(value=0).rolling(window=7).sum()
print(dfExt)

我知道我可以将timedelta转换为浮点数小时,然后进行总和,但是结果并不完全是我想要的。 有什么建议么?

2 个答案:

答案 0 :(得分:3)

这可以做到:

dfExt['RunTime_MS'] = pd.to_timedelta(dfExt['RunTime'].fillna(0).dt.total_seconds().rolling(window=7).sum(), unit='s')
print(dfExt)
    RunTime RunTime_MS
0  00:51:25        NaT
1       NaT        NaT
2  00:42:16        NaT
3       NaT        NaT
4  00:40:15        NaT
5       NaT        NaT
6  00:50:13   03:04:09
7  00:53:28   03:06:12
8       NaT   03:06:12
9  00:37:32   03:01:28
10      NaT   03:01:28
11 01:53:22   04:14:35
12 01:08:22   05:22:57
13 00:59:57   05:32:41
14 00:12:22   04:51:35

答案 1 :(得分:0)

这是cumsum

df.fillna(pd.to_timedelta('00:00:00')).cumsum()
Out[54]: 
    RunTime
0  00:51:25
1  00:51:25
2  01:33:41
3  01:33:41
4  02:13:56
5  02:13:56
6  03:04:09
7  03:57:37
8  03:57:37
9  04:35:09
10 04:35:09
11 06:28:31
12 07:36:53
13 08:36:50
14 08:49:12

从numpy滚动

pd.to_timedelta(rolling_apply(sum,df.RunTime.fillna(pd.to_timedelta('00:00:00')).values,7),unit='ns')
Out[81]: 
TimedeltaIndex([       NaT,        NaT,        NaT,        NaT,        NaT,
                       NaT, '03:04:09', '03:06:12', '03:06:12', '03:01:28',
                '03:01:28', '04:14:35', '05:22:57', '05:32:41', '04:51:35'],
               dtype='timedelta64[ns]', freq=None)



def rolling_apply(fun, a, w):
    r = np.empty(a.shape)
    r.fill(np.nan)
    for i in range(w - 1, a.shape[0]):
        r[i] = fun(a[(i-w+1):i+1])
    return r