pd.merge_asof失败并出现' ValueError:必须对左键进行排序'在第二轮

时间:2017-09-21 10:18:18

标签: python pandas

您好我试图在最接近的匹配日期_时间上合并两个数据集。

我有两张时间戳用于开放和结束活动。

merge_asof在开放日期运行正常,但返回' ValueError:左键必须在第二个date_time排序'

我在两种情况下按相关的date_time排序。

第一个数据帧:

   idtbl_station_manager     date_time_stamp fld_station_number  \
0                   1121 2017-09-19 15:41:24            AM00571   
1                   1122 2017-09-19 15:41:24            AM00572   
2                   1123 2017-09-19 15:41:24            AM00573   

  fld_grid_number fld_status  fld_station_number_int  \
0     VOY-024-001     CLOSED                     571   
1     VOY-024-002     CLOSED                     572   
2     VOY-024-003     CLOSED                     573   

                  fld_activities date_time_stamp_open fld_lat_open  \
0  Drift Net,CTD-Overside,Dredge  2017-04-13 07:23:35                
1  Drift Net,CTD-Overside,Dredge  2017-04-13 10:15:07   4649.028 S   
2  Drift Net,CTD-Overside,Dredge  2017-04-13 13:15:42   4648.497 S   

  fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close  
0  03759.143 E   2017-04-13 09:51:18    4647.361 S   03759.142 E  
1  03759.143 E   2017-04-13 12:11:00    4647.344 S   03759.143 E  
2                2017-04-13 15:09:26    4647.344 S   03759.143 E  

第二个数据帧:

         idtbl_gpgga     date_time_stamp    fld_utc   fld_lat fld_lat_dir  \
1179828      1179829 2017-04-04 02:00:04  000005.00  3354.138           S   
0                  1 2017-04-04 02:00:05  000006.00  3354.138           S   
1                  2 2017-04-04 02:00:07  000008.00  3354.138           S   

          fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt  \
1179828  1825.557           E               1      10      0.9    21.6   
0        1825.557           E               1      10      0.9    21.6   
1        1825.557           E               1      10      0.9    21.6   

        fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id  
1179828            M          31.9                  M                        0  
0                  M          31.9                  M                        0  
1                  M          31.9                  M                        0  

这可以按预期工作:

# First we grab the open time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")

pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' +  pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' +  pd_open["fld_lon_dir"]     

这失败了:

  

' ValueError:必须对左键进行排序'

# Now we grab the close time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")

pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' +  pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' +  pd_close["fld_lon_dir"]  

非常感谢任何建议。

1 个答案:

答案 0 :(得分:2)

如@JohnE所述,df_stationManager数据框中存在NaT值。

合并前通过清洁解决:

df_stationManager = df_stationManager.dropna()