Dplyr错误地连接了多个列

时间:2019-02-23 04:28:17

标签: r dplyr data.table left-join

我正在尝试在两个数据帧上执行多个匹配列的左连接-所有列均为数字类:

第一个是test_trips

> test_trips
  start_station_longitude start_station_latitude end_station_longitude end_station_latitude
1               -122.4264               37.76142             -122.4312             37.78352
2               -122.3942               37.79539             -122.3942             37.79539
3               -121.8948               37.34876             -121.8771             37.32600
4               -122.4212               37.77379             -122.3997             37.78095
5               -122.4262               37.77643             -122.4035             37.78898
6               -122.2825               37.80369             -122.2948             37.80532

第二个是test_distance

> test_distance
  start_station_longitude start_station_latitude end_station_longitude end_station_latitude distance
1               -122.4264               37.76142             -122.4312             37.78352     3162
2               -122.3942               37.79539             -122.3942             37.79539        0
3               -121.8948               37.34876             -121.8771             37.32600     3591
4               -122.4212               37.77379             -122.3997             37.78095     2843
5               -122.4262               37.77643             -122.4035             37.78898     3088
6               -122.2825               37.80369             -122.2948             37.80532     1198

请注意,两个表的4列中每列的坐标值如何完全相同。现在为什么这种左联接不起作用?

output <- dplyr::left_join(test_trips, test_distance, by=c( "start_station_longitude", "start_station_latitude",  "end_station_longitude", "end_station_latitude"))

输出产生 4个NA值,我不明白-怎么了?我的猜测是与test_distance的结构有关,但是我不确定如何修改它,因为class()告诉我它是正常的data.frame

> output
  start_station_longitude start_station_latitude end_station_longitude end_station_latitude distance
1               -122.4264               37.76142             -122.4312             37.78352       NA
2               -122.3942               37.79539             -122.3942             37.79539        0
3               -121.8948               37.34876             -121.8771             37.32600       NA
4               -122.4212               37.77379             -122.3997             37.78095       NA
5               -122.4262               37.77643             -122.4035             37.78898       NA
6               -122.2825               37.80369             -122.2948             37.80532     1198

这是示例数据:

> dput(test_trips)
structure(list(start_station_longitude = c(-122.4264353, -122.394203, 
-121.894797831774, -122.42123901844, -122.426244020462, -122.282497
), start_station_latitude = c(37.7614205, 37.795392, 37.3487586867448, 
37.7737932060887, 37.7764348192047, 37.8036865), end_station_longitude = c(-122.431157827377, 
-122.394203, -121.87712, -122.399749159813, -122.403452, -122.2948365
), end_station_latitude = c(37.783520835261, 37.795392, 37.3259984, 
37.7809545996075, 37.788975, 37.8053183)), row.names = c(NA, 
6L), class = "data.frame")

dput(test_distance)
structure(list(start_station_longitude = c(-122.4264353, -122.394203, 
-121.894797831774, -122.42123901844, -122.426244020462, -122.282497
), start_station_latitude = c(37.7614205, 37.795392, 37.3487586867448, 
37.7737932060887, 37.7764348192047, 37.8036865), end_station_longitude = c(-122.431157827377, 
-122.394203, -121.87712, -122.399749159813, -122.403452, -122.2948365
), end_station_latitude = c(37.783520835261, 37.795392, 37.3259984, 
37.7809545996075, 37.788975, 37.8053183), distance = c(3162, 
0, 3591, 2843, 3088, 1198)), class = "data.frame", row.names = c(NA, 
6L))


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.8 readr_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       crayon_1.3.4     assertthat_0.2.0 R6_2.3.0         jsonlite_1.6    
 [6] magrittr_1.5     pillar_1.3.1     rlang_0.3.1      rstudioapi_0.9.0 bindrcpp_0.2.2  
[11] tools_3.5.2      glue_1.3.0       purrr_0.3.0      hms_0.4.2        yaml_2.2.0      
[16] compiler_3.5.2   pkgconfig_2.0.2  tidyselect_0.2.5 bindr_0.1.1      tibble_2.0.1  

0 个答案:

没有答案