我有如下所示的数据框
Doctor Start B_ID Session Finish
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
A 2020-01-18 14:00:00 6 S3 2020-01-18 14:15:00
A 2020-01-18 14:00:00 7 S3 2020-01-18 14:28:00
A 2020-01-18 14:30:00 8 S3 2020-01-18 14:40:00
A 2020-01-18 14:30:00 9 S3 2020-01-18 15:01:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
A 2020-01-19 14:00:00 14 S2 2020-01-19 14:20:00
从上述数据框中,我想找出每个会话的最后开始时间和最后结束时间,并创建一列“ expected_finish”时间,该时间比最后一个开始时间长30分钟。
预期输出:
Session last_start last_finish expected_finish
S1 2020-01-18 13:30:00 2020-01-18 13:56:00 2020-01-18 14:00:00
S3 2020-01-18 14:30:00 2020-01-18 15:01:00 2020-01-18 15:00:00
S2 2020-01-19 14:00:00 2020-01-19 14:20:00 2020-01-19 14:30:00
说明:
df ['Expected_finish'] = df ['last_start'] + 30分钟
答案 0 :(得分:1)
将GroupBy.agg
与命名聚合一起使用,然后将30
分钟添加到新列:
df = df.groupby('Session').agg(last_start=('Start','last'),
last_finish=('Finish','last'))
df['expected_finish'] = df['last_start'] + pd.Timedelta(30, unit='Min')
print (df)
last_start last_finish expected_finish
Session
S1 2020-01-18 13:30:00 2020-01-18 13:56:00 2020-01-18 14:00:00
S2 2020-01-19 14:00:00 2020-01-19 14:20:00 2020-01-19 14:30:00
S3 2020-01-18 14:30:00 2020-01-18 15:01:00 2020-01-18 15:00:00