根据数据框中的日期差异计算天数直到下一个事件

时间:2018-02-06 21:36:26

标签: python-3.x pandas datetime

我有一个pandas数据框,我想为days_until_next_event计算:

df = pd.DataFrame({'message_count': [1, 3, 5, 6, 2, 8, 10, 2], 'event_date': ['2016-01-05', '2016-01-05', '2016-01-05', '2016-01-13', '2016-01-13', '2016-01-13', '2016-01-28', '2016-01-28'], 'message_date': ['2016-01-05', '2016-01-06', '2016-01-10', '2016-01-13', '2016-01-16', '2016-01-22', '2016-01-28', '2016-01-30']})

event_date  message_count   message_date
2016-01-05       1           2016-01-05
2016-01-05       3           2016-01-06
2016-01-05       5           2016-01-10
2016-01-13       6           2016-01-13
2016-01-13       2           2016-01-16
2016-01-13       8           2016-01-22
2016-01-28       10          2016-01-28
2016-01-28       2           2016-01-30

预期的数据框如下所示:

days_until_next_event   event_date  message_count   message_date    
      0 days            2016-01-05       1           2016-01-05 
      7 days            2016-01-05       3           2016-01-06 
      3 days            2016-01-05       5           2016-01-10 
      0 days            2016-01-13       6           2016-01-13 
      12 days           2016-01-13       2           2016-01-16 
      6 days            2016-01-13       8           2016-01-22 
      0 days            2016-01-28      10           2016-01-28 
      NaT               2016-01-28       2           2016-01-30 

days_until_next_eventmessage_date和下一个 event_date之间存在差异。如果两个日期相同,那么它的值为0.我能够获得自上次事件以来的日期:

df2['days_since_last_dte'] = [(message - event) for message, event in zip(df2['message_date'], df2['event_date'])]

但是在将最后一段比较添加到下一个" new" event_date

2 个答案:

答案 0 :(得分:4)

IIUC(PS:假设您的df已经排序,如果不是sort_values则首先进行排序)

df['New']=df.event_date.map(pd.Series(df.event_date.unique()[1:],index=df.event_date.unique()[:-1]))

df.loc[df.groupby('event_date').head(1).index,'DiffDays']=0

df
Out[1191]: 
   event_date  message_count message_date        New          DiffDays
0  2016-01-05              1   2016-01-05 2016-01-13                 0
1  2016-01-05              3   2016-01-06 2016-01-13   7 days 00:00:00
2  2016-01-05              5   2016-01-10 2016-01-13   3 days 00:00:00
3  2016-01-13              6   2016-01-13 2016-01-28                 0
4  2016-01-13              2   2016-01-16 2016-01-28  12 days 00:00:00
5  2016-01-13              8   2016-01-22 2016-01-28   6 days 00:00:00
6  2016-01-28             10   2016-01-28        NaT                 0
7  2016-01-28              2   2016-01-30        NaT               NaT

答案 1 :(得分:3)

使用bfill

df.event_date = pd.to_datetime(df.event_date)
df.message_date = pd.to_datetime(df.message_date)

创建新列'next_event_date'

df['next_event_date'] = df.loc[df.event_date != df.event_date.shift(1),'event_date']

回填next_event_date到阵容数据:

df['next_event_date'] = df['next_event_date'].bfill()

减去next_event_date和message_date:

df['days_until_next_event'] = df['next_event_date'] - df['message_date']
print(df)

输出:

  event_date  message_count message_date next_event_date days_until_next_event
0 2016-01-05              1   2016-01-05      2016-01-05                0 days
1 2016-01-05              3   2016-01-06      2016-01-13                7 days
2 2016-01-05              5   2016-01-10      2016-01-13                3 days
3 2016-01-13              6   2016-01-13      2016-01-13                0 days
4 2016-01-13              2   2016-01-16      2016-01-28               12 days
5 2016-01-13              8   2016-01-22      2016-01-28                6 days
6 2016-01-28             10   2016-01-28      2016-01-28                0 days
7 2016-01-28              2   2016-01-30             NaT                   NaT