大熊猫数据框中的日期范围比较

时间:2017-08-31 06:27:45

标签: python pandas date dataframe

我在python中有两个panda数据帧。 dataframe1包含样本数据。

      lat        long      tep  height  altitude      date_time        
  40.007647  116.319781    0   83  39688.535613  2008-08-20 12:51:17   
  40.007632  116.319878    0  119  39688.535637  2008-08-20 12:51:19   
  40.007615  116.319838    0  112  39688.535660  2008-08-20 12:51:21   

Dataframe 2包含以下示例数据。

Start_Time                   End_Time               Transportation_Mode
2008-08-20 12:09:17     2008-08-20 12:45:05                walk
2008-08-20 12:45:05     2008-08-20 13:00:25              subway
2008-08-20 13:00:25     2008-08-20 13:07:25                walk
2008-08-20 13:07:25     2008-08-20 13:12:59                 bus
2008-08-20 13:13:59     2008-08-20 13:24:23                walk

如果dataframe1中的data_time字段位于数据帧2的Start_TimeEnd_Time字段之间,则Dataframe1应根据条件从dataframe2中选择传输模式。然后从dataframe2中选择Transportation_mode并附加{{1在dataframe1中。

最终结果应如下所示

Tranportation_Mode

它的等效sql语句是

  lat        long      tep  height  altitude      date_time        Transportation_Mode
40.007647  116.319781    0   83  39688.535613  2008-08-20 12:51:17   subway   
40.007632  116.319878    0  119  39688.535637  2008-08-20 12:51:19   subway   
40.007615  116.319838    0  112  39688.535660  2008-08-20 12:51:21   subway   

1 个答案:

答案 0 :(得分:1)

merge_asof可以为您提供帮助

pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')

结果:

  lat     long    tep     height  altitude    date_time   Start_Time  End_Time    Transportation_Mode
0     40.007647   116.319781  0   83  39688.535613    2008-08-20 12:51:17     2008-08-20 12:45:05     2008-08-20 13:00:25     subway
1     40.007632   116.319878  0   119     39688.535637    2008-08-20 12:51:19     2008-08-20 12:45:05     2008-08-20 13:00:25     subway
2     40.007615   116.319838  0   112     39688.535660    2008-08-20 12:51:21     2008-08-20 12:45:05     2008-08-20 13:00:25     subway

这只关注Start_Time。如果你想查看End_Time,也可以这样做:

start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')['Transportation_Mode']
end = pd.merge_asof(df1, df2, left_on='date_time', right_on='End_Time', direction='forward')['Transportation_Mode']

pd.concat((df1, start[start == end].reindex(df1.index)), axis=1)
  lat     long    tep     height  altitude    date_time   Transportation_Mode
0     40.007647   116.319781  0   83  39688.535613    2008-08-20 12:51:17     subway
1     40.007632   116.319878  0   119     39688.535637    2008-08-20 12:51:19     subway
2     40.007615   116.319838  0   112     39688.535660    2008-08-20 12:51:21     subway

没有direction

如果你不能使用熊猫> 0.20你可以试试这个:

start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')
transportation_mode = start['Transportation_Mode'].loc[(start['Start_Time'] < start['date_time']) & (start['date_time'] < start['End_Time'])]
pd.concat((df1, transportation_mode), axis=1)
  lat     long    tep     height  altitude    date_time   Transportation_Mode
0     40.007647   116.319781  0   83  39688.535613    2008-08-20 12:51:17     subway
1     40.007632   116.319878  0   119     39688.535637    2008-08-20 12:51:19     subway
2     40.007615   116.319838  0   112     39688.535660    2008-08-20 12:51:21     subway
3894  40.088452   116.306029  0   177     39680.677581    2008-08-20 16:15:43     NaN
3895  40.088434   116.306011  0   178     39680.677604    2008-08-20 16:15:45     NaN
3896  40.088423   116.306002  0   179     39680.677627    2008-08-20 16:15:47     NaN
3897  40.088405   116.305990  0   179     39680.677650    2008-08-20 16:15:49     NaN
3898  40.088387   116.305963  0   180     39680.677674    2008-08-20 16:15:51     NaN