合并具有条件的两个pandas数据帧

时间:2018-04-06 14:34:13

标签: python pandas dataframe data-science

我需要在复杂条件下合并两个数据帧。这里有两个数据帧:

     dock_id     dock_name               avail_bikes    avail_docks  \
0    3082        Hope St & Union Ave     8              16   
1    468         Broadway & W 55 St      0              59   
2    407         Henry St & Poplar St    22             15   
3    3016        Kent Ave & N 7 St       29             16   

    status_key   datehour             ...    visi  vism   wdird     wdire  \
0   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN   
1   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN   
2   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN   
3   1            2016-06-01 19:25:00  ...    NaN   NaN    NaN       NaN    

     tot_docks    _lat               _long         in_service  
0    25           40.711674          -73.951413    1   
1    59           40.765265          -73.981923    1   
2    37           40.700469          -73.991454    1   
3    47           40.720368          -73.961651    1   

...

        Start Date/Time         End Date/Time            Event Agency  \
0       01/01/2016 12:00:00 AM  01/01/2016 02:00:00 AM   Parks Department   
1       01/02/2016 12:00:00 AM  01/02/2016 02:00:00 AM   Parks Department   
2       01/03/2016 12:00:00 AM  01/03/2016 02:00:00 AM   Parks Department   
3       01/04/2016 12:00:00 AM  01/04/2016 02:00:00 AM   Parks Department   

        latitude   longitude  
0       40.782865  -73.965355  
1       40.782865  -73.965355  
2       40.782865  -73.965355  
3       40.782865  -73.965355  
4       40.782865  -73.965355 

我想加入他们的条件:

Start Date/Time <= datehour <= End Date/Time and distance(_lat,_lon,latitude,longitude) < d

我知道可以合并数据然后在其上应用过滤器来执行此操作,但数据集太大(10263241行和401080行)。所以我认为这种方法不会在合理的时间内发挥作用。

您知道我们怎么能解决这个问题?

感谢您的回答!

1 个答案:

答案 0 :(得分:0)

将pandas导入为pd ... new_frame = pd.merge(dataframe1,dataframe2,on condition)

如果是更高级的合并,我们也可以指定列名 数据帧[[ '列1', '列2',...]]