我有一个pandas数据框,我想为days_until_next_event
计算:
df = pd.DataFrame({'message_count': [1, 3, 5, 6, 2, 8, 10, 2], 'event_date': ['2016-01-05', '2016-01-05', '2016-01-05', '2016-01-13', '2016-01-13', '2016-01-13', '2016-01-28', '2016-01-28'], 'message_date': ['2016-01-05', '2016-01-06', '2016-01-10', '2016-01-13', '2016-01-16', '2016-01-22', '2016-01-28', '2016-01-30']})
event_date message_count message_date
2016-01-05 1 2016-01-05
2016-01-05 3 2016-01-06
2016-01-05 5 2016-01-10
2016-01-13 6 2016-01-13
2016-01-13 2 2016-01-16
2016-01-13 8 2016-01-22
2016-01-28 10 2016-01-28
2016-01-28 2 2016-01-30
预期的数据框如下所示:
days_until_next_event event_date message_count message_date
0 days 2016-01-05 1 2016-01-05
7 days 2016-01-05 3 2016-01-06
3 days 2016-01-05 5 2016-01-10
0 days 2016-01-13 6 2016-01-13
12 days 2016-01-13 2 2016-01-16
6 days 2016-01-13 8 2016-01-22
0 days 2016-01-28 10 2016-01-28
NaT 2016-01-28 2 2016-01-30
days_until_next_event
与message_date
和下一个新 event_date
之间存在差异。如果两个日期相同,那么它的值为0.我能够获得自上次事件以来的日期:
df2['days_since_last_dte'] = [(message - event) for message, event in zip(df2['message_date'], df2['event_date'])]
但是在将最后一段比较添加到下一个" new" event_date
答案 0 :(得分:4)
IIUC(PS:假设您的df已经排序,如果不是sort_values
则首先进行排序)
df['New']=df.event_date.map(pd.Series(df.event_date.unique()[1:],index=df.event_date.unique()[:-1]))
df.loc[df.groupby('event_date').head(1).index,'DiffDays']=0
df
Out[1191]:
event_date message_count message_date New DiffDays
0 2016-01-05 1 2016-01-05 2016-01-13 0
1 2016-01-05 3 2016-01-06 2016-01-13 7 days 00:00:00
2 2016-01-05 5 2016-01-10 2016-01-13 3 days 00:00:00
3 2016-01-13 6 2016-01-13 2016-01-28 0
4 2016-01-13 2 2016-01-16 2016-01-28 12 days 00:00:00
5 2016-01-13 8 2016-01-22 2016-01-28 6 days 00:00:00
6 2016-01-28 10 2016-01-28 NaT 0
7 2016-01-28 2 2016-01-30 NaT NaT
答案 1 :(得分:3)
使用bfill
:
df.event_date = pd.to_datetime(df.event_date)
df.message_date = pd.to_datetime(df.message_date)
创建新列'next_event_date'
df['next_event_date'] = df.loc[df.event_date != df.event_date.shift(1),'event_date']
回填next_event_date到阵容数据:
df['next_event_date'] = df['next_event_date'].bfill()
减去next_event_date和message_date:
df['days_until_next_event'] = df['next_event_date'] - df['message_date']
print(df)
输出:
event_date message_count message_date next_event_date days_until_next_event
0 2016-01-05 1 2016-01-05 2016-01-05 0 days
1 2016-01-05 3 2016-01-06 2016-01-13 7 days
2 2016-01-05 5 2016-01-10 2016-01-13 3 days
3 2016-01-13 6 2016-01-13 2016-01-13 0 days
4 2016-01-13 2 2016-01-16 2016-01-28 12 days
5 2016-01-13 8 2016-01-22 2016-01-28 6 days
6 2016-01-28 10 2016-01-28 2016-01-28 0 days
7 2016-01-28 2 2016-01-30 NaT NaT