假设我有两个具有以下值的数据框:
DF1 Name Time-In
Person1 2020-04-21 20:32:44
Person2 2020-04-21 20:37:19
Person3 2020-04-21 20:44:04
Person1 2020-04-21 21:17:22
Person1 2020-04-21 23:00:00
DF2 Name Time-Out
Person1 2020-04-21 20:50:11
Person2 2020-04-21 21:15:15
Person1 2020-04-21 22:00:59
我想根据名称出现的顺序(DF1上的Person1的第一个Time-In合并到DF2上的Person1的第一个Time-Out)来合并表,对于像Person3这样的NaN实例(在DF2中没有记录) ,并且对于Person1在DF1中具有附加值的情况。决赛桌看起来像这样:
DF3 Name Time-In Time-Out
Person1 2020-04-21 20:32:44 2020-04-21 20:50:11
Person2 2020-04-21 20:37:19 2020-04-21 21:15:15
Person3 2020-04-21 20:44:04 NaN
Person1 2020-04-21 21:17:22 2020-04-21 22:00:59
Person1 2020-04-21 23:00:00 NaN
关于如何执行此操作的任何想法?预先感谢。
答案 0 :(得分:0)
将merge_asof
与direction='forward'
参数一起使用:
df1['Time-In'] = pd.to_datetime(df1['Time-In'])
df2['Time-Out'] = pd.to_datetime(df2['Time-Out'])
df = pd.merge_asof(df1,
df2,
left_on='Time-In',
right_on='Time-Out',
by='Name',
direction='forward')
print (df)
Name Time-In Time-Out
0 Person1 2020-04-21 20:32:44 2020-04-21 20:50:11
1 Person2 2020-04-21 20:37:19 2020-04-21 21:15:15
2 Person3 2020-04-21 20:44:04 NaT
3 Person1 2020-04-21 21:17:22 2020-04-21 22:00:59
4 Person1 2020-04-21 23:00:00 NaT