我试图在一个像这样的数据帧中获得时间值的滚动总和:
RunTime
0 00:51:25
1 NaT
2 00:42:16
3 NaT
4 00:40:15
5 NaT
6 00:50:13
7 00:53:28
8 NaT
9 00:37:32
10 NaT
11 01:53:22
12 01:08:22
13 00:59:57
14 00:12:22
预期输出:
RunTime RunTime_MS
0 00:51:25
1 NaT
2 00:42:16
3 NaT
4 00:40:15
5 NaT
6 00:50:13 3:04:09
7 00:53:28 3:06:12
8 NaT 3:06:12
9 00:37:32 3:01:28
10 NaT 3:01:28
11 01:53:22 4:14:35
12 01:08:22 5:22:57
13 00:59:57 5:32:41
14 00:12:22 4:51:35
对于我正在使用的数据框中的其他列(包含浮点数)
dfExt['Distance_MS'] = dfExt['Distance'].fillna(value=0).rolling(window=7).sum()
这很好用。 如果我尝试在时间列上执行此操作,则会收到错误
此dtype timedelta64 [ns]的滚动操作未实现
尽管the documentation似乎表明.sum()是您可以在timedelta上执行的操作。
这是示例代码:
import pandas as pd
from datetime import datetime, timedelta
RunTimeValues = ['00:51:25','','00:42:16','','00:40:15','','00:50:13','00:53:28','','00:37:32','','01:53:22','01:08:22','00:59:57','00:12:22']
for i in range(len(RunTimeValues)):
if RunTimeValues[i] != '':
#RunTimeValues[i] = datetime.strptime(RunTimeValues[i], "%H:%M:%S")
t = datetime.strptime(RunTimeValues[i],"%H:%M:%S")
RunTimeValues[i] = timedelta(hours=t.hour, minutes=t.minute, seconds=t.second)
dfExt = pd.DataFrame({'RunTime': RunTimeValues})
dfExt['RunTime_MS'] = dfExt['RunTime'].fillna(value=0).rolling(window=7).sum()
print(dfExt)
我知道我可以将timedelta转换为浮点数小时,然后进行总和,但是结果并不完全是我想要的。 有什么建议么?
答案 0 :(得分:3)
这可以做到:
dfExt['RunTime_MS'] = pd.to_timedelta(dfExt['RunTime'].fillna(0).dt.total_seconds().rolling(window=7).sum(), unit='s')
print(dfExt)
RunTime RunTime_MS
0 00:51:25 NaT
1 NaT NaT
2 00:42:16 NaT
3 NaT NaT
4 00:40:15 NaT
5 NaT NaT
6 00:50:13 03:04:09
7 00:53:28 03:06:12
8 NaT 03:06:12
9 00:37:32 03:01:28
10 NaT 03:01:28
11 01:53:22 04:14:35
12 01:08:22 05:22:57
13 00:59:57 05:32:41
14 00:12:22 04:51:35
答案 1 :(得分:0)
这是cumsum
df.fillna(pd.to_timedelta('00:00:00')).cumsum()
Out[54]:
RunTime
0 00:51:25
1 00:51:25
2 01:33:41
3 01:33:41
4 02:13:56
5 02:13:56
6 03:04:09
7 03:57:37
8 03:57:37
9 04:35:09
10 04:35:09
11 06:28:31
12 07:36:53
13 08:36:50
14 08:49:12
从numpy滚动
pd.to_timedelta(rolling_apply(sum,df.RunTime.fillna(pd.to_timedelta('00:00:00')).values,7),unit='ns')
Out[81]:
TimedeltaIndex([ NaT, NaT, NaT, NaT, NaT,
NaT, '03:04:09', '03:06:12', '03:06:12', '03:01:28',
'03:01:28', '04:14:35', '05:22:57', '05:32:41', '04:51:35'],
dtype='timedelta64[ns]', freq=None)
def rolling_apply(fun, a, w):
r = np.empty(a.shape)
r.fill(np.nan)
for i in range(w - 1, a.shape[0]):
r[i] = fun(a[(i-w+1):i+1])
return r