我正在尝试在两个数据帧上执行多个匹配列的左连接-所有列均为数字类:
第一个是test_trips
:
> test_trips
start_station_longitude start_station_latitude end_station_longitude end_station_latitude
1 -122.4264 37.76142 -122.4312 37.78352
2 -122.3942 37.79539 -122.3942 37.79539
3 -121.8948 37.34876 -121.8771 37.32600
4 -122.4212 37.77379 -122.3997 37.78095
5 -122.4262 37.77643 -122.4035 37.78898
6 -122.2825 37.80369 -122.2948 37.80532
第二个是test_distance
:
> test_distance
start_station_longitude start_station_latitude end_station_longitude end_station_latitude distance
1 -122.4264 37.76142 -122.4312 37.78352 3162
2 -122.3942 37.79539 -122.3942 37.79539 0
3 -121.8948 37.34876 -121.8771 37.32600 3591
4 -122.4212 37.77379 -122.3997 37.78095 2843
5 -122.4262 37.77643 -122.4035 37.78898 3088
6 -122.2825 37.80369 -122.2948 37.80532 1198
请注意,两个表的4列中每列的坐标值如何完全相同。现在为什么这种左联接不起作用?
output <- dplyr::left_join(test_trips, test_distance, by=c( "start_station_longitude", "start_station_latitude", "end_station_longitude", "end_station_latitude"))
输出产生 4个NA值,我不明白-怎么了?我的猜测是与test_distance
的结构有关,但是我不确定如何修改它,因为class()
告诉我它是正常的data.frame
> output
start_station_longitude start_station_latitude end_station_longitude end_station_latitude distance
1 -122.4264 37.76142 -122.4312 37.78352 NA
2 -122.3942 37.79539 -122.3942 37.79539 0
3 -121.8948 37.34876 -121.8771 37.32600 NA
4 -122.4212 37.77379 -122.3997 37.78095 NA
5 -122.4262 37.77643 -122.4035 37.78898 NA
6 -122.2825 37.80369 -122.2948 37.80532 1198
这是示例数据:
> dput(test_trips)
structure(list(start_station_longitude = c(-122.4264353, -122.394203,
-121.894797831774, -122.42123901844, -122.426244020462, -122.282497
), start_station_latitude = c(37.7614205, 37.795392, 37.3487586867448,
37.7737932060887, 37.7764348192047, 37.8036865), end_station_longitude = c(-122.431157827377,
-122.394203, -121.87712, -122.399749159813, -122.403452, -122.2948365
), end_station_latitude = c(37.783520835261, 37.795392, 37.3259984,
37.7809545996075, 37.788975, 37.8053183)), row.names = c(NA,
6L), class = "data.frame")
dput(test_distance)
structure(list(start_station_longitude = c(-122.4264353, -122.394203,
-121.894797831774, -122.42123901844, -122.426244020462, -122.282497
), start_station_latitude = c(37.7614205, 37.795392, 37.3487586867448,
37.7737932060887, 37.7764348192047, 37.8036865), end_station_longitude = c(-122.431157827377,
-122.394203, -121.87712, -122.399749159813, -122.403452, -122.2948365
), end_station_latitude = c(37.783520835261, 37.795392, 37.3259984,
37.7809545996075, 37.788975, 37.8053183), distance = c(3162,
0, 3591, 2843, 3088, 1198)), class = "data.frame", row.names = c(NA,
6L))
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.7.8 readr_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 crayon_1.3.4 assertthat_0.2.0 R6_2.3.0 jsonlite_1.6
[6] magrittr_1.5 pillar_1.3.1 rlang_0.3.1 rstudioapi_0.9.0 bindrcpp_0.2.2
[11] tools_3.5.2 glue_1.3.0 purrr_0.3.0 hms_0.4.2 yaml_2.2.0
[16] compiler_3.5.2 pkgconfig_2.0.2 tidyselect_0.2.5 bindr_0.1.1 tibble_2.0.1