我在python中有两个panda数据帧。 dataframe1包含样本数据。
lat long tep height altitude date_time
40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17
40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19
40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21
Dataframe 2包含以下示例数据。
Start_Time End_Time Transportation_Mode
2008-08-20 12:09:17 2008-08-20 12:45:05 walk
2008-08-20 12:45:05 2008-08-20 13:00:25 subway
2008-08-20 13:00:25 2008-08-20 13:07:25 walk
2008-08-20 13:07:25 2008-08-20 13:12:59 bus
2008-08-20 13:13:59 2008-08-20 13:24:23 walk
如果dataframe1中的data_time字段位于数据帧2的Start_Time
和End_Time
字段之间,则Dataframe1应根据条件从dataframe2中选择传输模式。然后从dataframe2中选择Transportation_mode并附加{{1在dataframe1中。
最终结果应如下所示
Tranportation_Mode
它的等效sql语句是
lat long tep height altitude date_time Transportation_Mode
40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 subway
40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 subway
40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 subway
答案 0 :(得分:1)
merge_asof可以为您提供帮助
pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')
结果:
lat long tep height altitude date_time Start_Time End_Time Transportation_Mode 0 40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 2008-08-20 12:45:05 2008-08-20 13:00:25 subway 1 40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 2008-08-20 12:45:05 2008-08-20 13:00:25 subway 2 40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 2008-08-20 12:45:05 2008-08-20 13:00:25 subway
这只关注Start_Time
。如果你想查看End_Time
,也可以这样做:
start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')['Transportation_Mode']
end = pd.merge_asof(df1, df2, left_on='date_time', right_on='End_Time', direction='forward')['Transportation_Mode']
pd.concat((df1, start[start == end].reindex(df1.index)), axis=1)
lat long tep height altitude date_time Transportation_Mode 0 40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 subway 1 40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 subway 2 40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 subway
direction
如果你不能使用熊猫> 0.20你可以试试这个:
start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')
transportation_mode = start['Transportation_Mode'].loc[(start['Start_Time'] < start['date_time']) & (start['date_time'] < start['End_Time'])]
pd.concat((df1, transportation_mode), axis=1)
lat long tep height altitude date_time Transportation_Mode 0 40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 subway 1 40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 subway 2 40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 subway 3894 40.088452 116.306029 0 177 39680.677581 2008-08-20 16:15:43 NaN 3895 40.088434 116.306011 0 178 39680.677604 2008-08-20 16:15:45 NaN 3896 40.088423 116.306002 0 179 39680.677627 2008-08-20 16:15:47 NaN 3897 40.088405 116.305990 0 179 39680.677650 2008-08-20 16:15:49 NaN 3898 40.088387 116.305963 0 180 39680.677674 2008-08-20 16:15:51 NaN