我有两个不同的Pandas DataFrame
一个采样时间为60毫秒,另一个采样时间为1秒。
tobii_df
:
datetime, col1
0 2017-08-24 12:59:00.753, 1.3
...
1 2017-08-24 12:59:13.753, 7.2
2 2017-08-24 12:59:13.773, 6.1
3 2017-08-24 12:59:13.793, 5.1
4 2017-08-24 13:00:00.813, 5.4
hr_df
:
datetime, col2
0 2017-08-24 12:59:00, 60
1 2017-08-24 13:00:00, 64
2 2017-08-24 13:01:00, 63
3 2017-08-24 13:02:00, 67
4 2017-08-24 13:03:00, 61
我希望最终结果是
datetime, col1, col2
0 2017-08-24 12:59:00.753, 1.3, 60
...
1 2017-08-24 12:59:13.753, 7.2,
2 2017-08-24 12:59:13.773, 6.1,
3 2017-08-24 12:59:13.793, 5.1,
4 2017-08-24 13:00:00.813, 5.4, 64
此代码合并两个数据帧,但在df1的多个读数上复制了值60。
hr_df = hr_df.sort_values(by='datetime')
tobii_df = tobii_df.sort_values(by='datetime')
hr_df = hr_df.set_index('datetime')
tobii_df = tobii_df.set_index('datetime')
merged_df = pd.merge_asof(tobii_df, hr_df, left_index=True, right_index=True, suffixes=('_', ''))
我也试过
hr_df = hr_df.set_index('datetime')\
.reindex(tobii_df.set_index('datetime').index, method='nearest')\
.reset_index()
merged_df = pd.merge(tobii_df, hr_df, on='datetime')
这也为tobii_df
的每个读数创建了重复项。两个代码的最终结果类似于:
datetime, col1, col2
0 2017-08-24 12:59:00.753, 1.3, 60
...
1 2017-08-24 12:59:13.753, 7.2, 60
2 2017-08-24 12:59:13.773, 6.1, 60
3 2017-08-24 12:59:13.793, 5.1, 60
4 2017-08-24 13:00:00.813, 5.4, 64