您好我试图在最接近的匹配日期_时间上合并两个数据集。
我有两张时间戳用于开放和结束活动。
merge_asof在开放日期运行正常,但返回' ValueError:左键必须在第二个date_time排序' 。
我在两种情况下按相关的date_time排序。
第一个数据帧:
idtbl_station_manager date_time_stamp fld_station_number \
0 1121 2017-09-19 15:41:24 AM00571
1 1122 2017-09-19 15:41:24 AM00572
2 1123 2017-09-19 15:41:24 AM00573
fld_grid_number fld_status fld_station_number_int \
0 VOY-024-001 CLOSED 571
1 VOY-024-002 CLOSED 572
2 VOY-024-003 CLOSED 573
fld_activities date_time_stamp_open fld_lat_open \
0 Drift Net,CTD-Overside,Dredge 2017-04-13 07:23:35
1 Drift Net,CTD-Overside,Dredge 2017-04-13 10:15:07 4649.028 S
2 Drift Net,CTD-Overside,Dredge 2017-04-13 13:15:42 4648.497 S
fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close
0 03759.143 E 2017-04-13 09:51:18 4647.361 S 03759.142 E
1 03759.143 E 2017-04-13 12:11:00 4647.344 S 03759.143 E
2 2017-04-13 15:09:26 4647.344 S 03759.143 E
第二个数据帧:
idtbl_gpgga date_time_stamp fld_utc fld_lat fld_lat_dir \
1179828 1179829 2017-04-04 02:00:04 000005.00 3354.138 S
0 1 2017-04-04 02:00:05 000006.00 3354.138 S
1 2 2017-04-04 02:00:07 000008.00 3354.138 S
fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt \
1179828 1825.557 E 1 10 0.9 21.6
0 1825.557 E 1 10 0.9 21.6
1 1825.557 E 1 10 0.9 21.6
fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id
1179828 M 31.9 M 0
0 M 31.9 M 0
1 M 31.9 M 0
这可以按预期工作:
# First we grab the open time lat and lons
# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)
#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")
pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' + pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' + pd_open["fld_lon_dir"]
这失败了:
' ValueError:必须对左键进行排序'
# Now we grab the close time lat and lons
# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)
#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")
pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' + pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' + pd_close["fld_lon_dir"]
非常感谢任何建议。
答案 0 :(得分:2)
如@JohnE所述,df_stationManager数据框中存在NaT值。
合并前通过清洁解决:
df_stationManager = df_stationManager.dropna()