我有两个数据框示例数据:
created_at PM 2.5 PM 10 entry_id
2018-06-13 16:11:43 4.67 5.17 20
2018-06-14 11:16:43 5.01 8.05 21
action end_at
done_at
2018-06-13 10:15:00 action 1 Nan
2018-06-11 12:15:00 action 2 Nan
我想根据离第一个数据帧最近的时间将“ PM 10”值添加到第二个帧。新的数据框应该看起来像
action end_at PM 10
done_at
2018-06-13 10:15:00 action 1 Nan 5.17
2018-06-11 12:15:00 action 2 Nan 5.17
问题在于时间不匹配。这可能吗?
答案 0 :(得分:2)
为此,您可以将direction='nearest'
用于merge_asof
。根据示例数据框的格式,假设done_at
是索引,因此您必须首先重置索引:
>>> df1
created_at PM 2.5 PM 10 entry_id
0 2018-06-13 16:11:43 4.67 5.17 20
1 2018-06-14 11:16:43 5.01 8.05 21
>>> df2
action end_at
done_at
2018-06-13 10:15:00 action 1 Nan
2018-06-11 12:15:00 action 2 Nan
df1['created_at'] = pd.to_datetime(df1['created_at'])
df2.index = pd.to_datetime(df2.index)
new_df = (pd.merge_asof(df2.reset_index().sort_values('done_at'),
df1[['created_at','PM 10']],left_on='done_at',
right_on='created_at', direction='nearest')
.drop('created_at',axis=1))
>>> new_df
done_at action end_at PM 10
0 2018-06-11 12:15:00 action 2 Nan 5.17
1 2018-06-13 10:15:00 action 1 Nan 5.17