此问题基于以下问题:How to join two dataframes for which column values are within a certain range?并由@coldspeed回答。下面是针对我的问题修改的DataFrame:
print df_1
timestamp A B User
0 2016-05-14 10:00 0.020228 0.026572 1
1 2016-05-14 10:00 0.057780 0.175499 2
2 2016-05-14 10:00 0.098808 0.620986 3
3 2016-05-14 10:15 0.158789 1.014819 1
4 2016-05-14 10:15 0.038129 2.384590 2
5 2016-05-14 10:15 0.038129 2.384590 3
print df_2
start end event User
0 2016-05-14 10:00 2016-05-14 10:54:33 E1 1
1 2016-05-14 10:00 2016-05-14 10:54:37 E2 2
2 2016-05-14 10:00 2016-05-14 10:54:42 E3 3
desired output:
timestamp A B User event
0 2016-05-14 10:00 0.020228 0.026572 1 E1
1 2016-05-14 10:00 0.057780 0.175499 2 E2
2 2016-05-14 10:00 0.098808 0.620986 3 E3
3 2016-05-14 10:15 0.158789 1.014819 1 E1
4 2016-05-14 10:15 0.038129 2.384590 2 E2
5 2016-05-14 10:15 0.038129 2.384590 3 E3
所以,我相信我可以用作基础:
idx = pd.IntervalIndex.from_arrays(df_2['start'], df_2['end'], closed='both')
event = df_2.loc[idx.get_indexer(df_1.timestamp), 'event']
df_1['event'] = event.values
但是我需要一种引用UserID的方法,以防止混淆重叠的会话。
答案 0 :(得分:0)
在这种情况下,您可以使用merge_asof
pd.merge_asof(df1,df2,left_on='timestamp',right_on='end',by='User',direction ='forward')
Out[148]:
timestamp A ... end event
0 2016-05-14 10:00:00 0.020228 ... 2016-05-14 10:54:33 E1
1 2016-05-14 10:00:00 0.057780 ... 2016-05-14 10:54:37 E2
2 2016-05-14 10:00:00 0.098808 ... 2016-05-14 10:54:42 E3
3 2016-05-14 10:15:00 0.158789 ... 2016-05-14 10:54:33 E1
4 2016-05-14 10:15:00 0.038129 ... 2016-05-14 10:54:37 E2
5 2016-05-14 10:15:00 0.038129 ... 2016-05-14 10:54:42 E3