根据经度和纬度的最短距离合并/合并两个数据框

时间:2019-05-23 01:00:11

标签: r dataframe fuzzyjoin

我有一个df1行有800行,另一个df2行有9 million行。两者都具有经度和纬度,并且df2还有一些列需要根据最短距离添加到df1中,因为纬度和经度在dataframes中都不完全相同。我使用了goe_join软件包中的Fuzzyjoin,但收到错误消息。

df1的摘要:

summary(df1)
          lat             lon           
 Min.   :25.39   Min.   :-124.62   
 1st Qu.:36.20   1st Qu.:-104.94    
 Median :40.63   Median : -84.15   
 Mean   :39.32   Mean   : -89.44    
 3rd Qu.:42.08   3rd Qu.: -73.97    
 Max.   :48.73   Max.   : -67.27  

df2的摘要:

summary(df2)
lon               lat                    x1                 x2                x3 
 Min.   :-124.73   Min.   :24.98   Min.   :-2230806   Min.   :-1569579   Min.   :     0.0  
 1st Qu.:-110.13   1st Qu.:34.78   1st Qu.:-1126720   1st Qu.: -508033   1st Qu.:   670.8  
 Median : -99.17   Median :39.06   Median : -263314   Median :  -15116   Median :  1507.5  
 Mean   : -99.17   Mean   :38.97   Mean   : -239487   Mean   :  -30086   Mean   :  2856.3  
 3rd Qu.: -88.94   3rd Qu.:43.25   3rd Qu.:  578810   3rd Qu.:  466600   3rd Qu.:  3354.7  
 Max.   : -66.97   Max.   :49.38   Max.   : 2122143   Max.   : 1270878   Max.   :395131.9  

这是我的代码:

merged.dfs <- geo_join(df1, df2, by = NULL, method = "haversine", mode = "left", max_dist = 1) 

这是我得到的错误:

Joining by: c("lat", "lon") 

Error in fuzzy_join(x, y, multi_by = by, multi_match_fun = match_fun, : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522

感谢您的帮助!

0 个答案:

没有答案