我有一个df1
行有800
行,另一个df2
行有9 million
行。两者都具有经度和纬度,并且df2
还有一些列需要根据最短距离添加到df1
中,因为纬度和经度在dataframes
中都不完全相同。我使用了goe_join
软件包中的Fuzzyjoin
,但收到错误消息。
df1
的摘要:
summary(df1)
lat lon
Min. :25.39 Min. :-124.62
1st Qu.:36.20 1st Qu.:-104.94
Median :40.63 Median : -84.15
Mean :39.32 Mean : -89.44
3rd Qu.:42.08 3rd Qu.: -73.97
Max. :48.73 Max. : -67.27
df2
的摘要:
summary(df2)
lon lat x1 x2 x3
Min. :-124.73 Min. :24.98 Min. :-2230806 Min. :-1569579 Min. : 0.0
1st Qu.:-110.13 1st Qu.:34.78 1st Qu.:-1126720 1st Qu.: -508033 1st Qu.: 670.8
Median : -99.17 Median :39.06 Median : -263314 Median : -15116 Median : 1507.5
Mean : -99.17 Mean :38.97 Mean : -239487 Mean : -30086 Mean : 2856.3
3rd Qu.: -88.94 3rd Qu.:43.25 3rd Qu.: 578810 3rd Qu.: 466600 3rd Qu.: 3354.7
Max. : -66.97 Max. :49.38 Max. : 2122143 Max. : 1270878 Max. :395131.9
这是我的代码:
merged.dfs <- geo_join(df1, df2, by = NULL, method = "haversine", mode = "left", max_dist = 1)
这是我得到的错误:
Joining by: c("lat", "lon")
Error in fuzzy_join(x, y, multi_by = by, multi_match_fun = match_fun, : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522
感谢您的帮助!