根据每个唯一记录熊猫数据帧的第一个和最后一个计算每个事件的总时间

时间:2018-10-01 20:37:56

标签: python pandas datetime time

我有一个熊猫数据框,如下所示:

  event_id           timestamp
0       e0 2015-07-20 12:00:56
1       e0 2015-07-20 13:00:56
2       e1 2015-07-20 01:30:00
3       e1 2015-07-20 02:30:00
4       e1 2015-07-20 03:00:00
5       e2 2015-07-20 18:45:00
6       e2 2015-07-20 18:47:00
7       e2 2015-07-20 18:48:00
8       e2 2015-07-20 18:49:00

我想计算每个事件产生的总时间:

                   timestamp  count (minutes)
event_id                                     
e0       2015-07-20 13:00:56             60.0
e1       2015-07-20 03:00:00             90.0
e2       2015-07-20 18:49:00              4.0

3 个答案:

答案 0 :(得分:2)

使用groupbyagg

s = df.groupby('event_id').timestamp.diff().div(pd.Timedelta(minutes=1))

df.assign(minutes=s).groupby('event_id').agg({'timestamp': 'last', 'minutes': 'sum'})

                   timestamp  minutes
event_id
e0       2015-07-20 13:00:56     60.0
e1       2015-07-20 03:00:00     90.0
e2       2015-07-20 18:49:00      4.0

答案 1 :(得分:1)

重新创建数据框:

import pandas as pd

df = pd.DataFrame([['e0','2015-07-20 12:00:56'],
    ['e0','2015-07-20 13:00:56'],
    ['e1','2015-07-20 01:30:00'],
    ['e1','2015-07-20 02:30:00'],
    ['e1','2015-07-20 03:00:00'],
    ['e2','2015-07-20 18:45:00'],
    ['e2','2015-07-20 18:47:00'],
    ['e2','2015-07-20 18:48:00'],
    ['e2','2015-07-20 18:49:00']],
    columns=['event_id','timestamp'])

您可以使用sort_values()确保为timestamp中的每个组对event_id列进行排序。然后,您可以利用groupby()apply()pd.Timedelta()来计算每个条目(或行)之间的时间差:

df['count (minutes)'] = df.sort_values(['event_id','timestamp']).groupby('event_id')['timestamp'].apply(lambda x: (x-x.iloc[0])/pd.Timedelta(1, 'm'))

哪个给:

  event_id           timestamp  count (minutes)
0       e0 2015-07-20 12:00:56              0.0
1       e0 2015-07-20 13:00:56             60.0
2       e1 2015-07-20 01:30:00              0.0
3       e1 2015-07-20 02:30:00             60.0
4       e1 2015-07-20 03:00:00             90.0
5       e2 2015-07-20 18:45:00              0.0
6       e2 2015-07-20 18:47:00              2.0
7       e2 2015-07-20 18:48:00              3.0
8       e2 2015-07-20 18:49:00              4.0

然后您可以再次调用groupby()并使用last()返回最后一行:

df.groupby('event_id').last()

收益:

                   timestamp  count (minutes)
event_id                                     
e0       2015-07-20 13:00:56             60.0
e1       2015-07-20 03:00:00             90.0
e2       2015-07-20 18:49:00              4.0

答案 2 :(得分:0)

您可以尝试使用groupby而不进行排序,

{