熊猫中datetime列的Groupby最后值

时间:2020-04-09 09:37:57

标签: pandas pandas-groupby

我有如下所示的数据框

  Doctor   Start               B_ID  Session      Finish
    A   2020-01-18 12:00:00     1    S1         2020-01-18 12:33:00
    A   2020-01-18 12:30:00     2    S1         2020-01-18 12:52:00
    A   2020-01-18 13:00:00     3    S1         2020-01-18 13:23:00
    A   2020-01-18 13:00:00     4    S1         2020-01-18 13:37:00
    A   2020-01-18 13:30:00     5    S1         2020-01-18 13:56:00
    A   2020-01-18 14:00:00     6    S3         2020-01-18 14:15:00
    A   2020-01-18 14:00:00     7    S3         2020-01-18 14:28:00
    A   2020-01-18 14:30:00     8    S3         2020-01-18 14:40:00
    A   2020-01-18 14:30:00     9    S3         2020-01-18 15:01:00
    A   2020-01-19 12:00:00    12    S2         2020-01-19 12:20:00
    A   2020-01-19 12:30:00    13    S2         2020-01-19 12:40:00 
    A   2020-01-19 14:00:00    14    S2         2020-01-19 14:20:00

从上述数据框中,我想找出每个会话的最后开始时间和最后结束时间,并创建一列“ expected_finish”时间,该时间比最后一个开始时间长30分钟。

预期输出:

Session   last_start             last_finish                 expected_finish
S1        2020-01-18 13:30:00    2020-01-18 13:56:00        2020-01-18 14:00:00
S3        2020-01-18 14:30:00    2020-01-18 15:01:00        2020-01-18 15:00:00
S2        2020-01-19 14:00:00    2020-01-19 14:20:00        2020-01-19 14:30:00

说明:

df ['Expected_finish'] = df ['last_start'] + 30分钟

1 个答案:

答案 0 :(得分:1)

GroupBy.agg与命名聚合一起使用,然后将30分钟添加到新列:

df = df.groupby('Session').agg(last_start=('Start','last'),
                               last_finish=('Finish','last'))

df['expected_finish'] = df['last_start'] + pd.Timedelta(30, unit='Min')
print (df)
                 last_start         last_finish     expected_finish
Session                                                            
S1      2020-01-18 13:30:00 2020-01-18 13:56:00 2020-01-18 14:00:00
S2      2020-01-19 14:00:00 2020-01-19 14:20:00 2020-01-19 14:30:00
S3      2020-01-18 14:30:00 2020-01-18 15:01:00 2020-01-18 15:00:00