在数据框中找到最佳行

时间:2018-05-24 06:36:42

标签: r dataframe

我有一些包含某些位置的数据集:

ex <- data.frame(lat = c(55, 60, 40), long = c(6, 6, 10))

并且我有气候数据

clim <- structure(list(lat = c(55.047, 55.097, 55.146, 55.004, 55.054, 
55.103, 55.153, 55.202, 55.252, 55.301), long = c(6.029, 6.0171, 
6.0051, 6.1269, 6.1151, 6.1032, 6.0913, 6.0794, 6.0675, 6.0555
), alt = c(0.033335, 0.033335, 0.033335, 0.033335, 0.033335, 
0.033335, 0.033335, 0.033335, 0.033335, 0.033335), x = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), y = c(1914, 1907.3, 1901.8, 1921.1, 
1914.1, 1908.3, 1902.4, 1896, 1889.8, 1884)), row.names = c(NA, 
10L), class = "data.frame", .Names = c("lat", "long", "alt", 
"x", "y"))

      lat   long      alt x      y
1  55.047 6.0290 0.033335 0 1914.0
2  55.097 6.0171 0.033335 0 1907.3
3  55.146 6.0051 0.033335 0 1901.8
4  55.004 6.1269 0.033335 0 1921.1
5  55.054 6.1151 0.033335 0 1914.1
6  55.103 6.1032 0.033335 0 1908.3
7  55.153 6.0913 0.033335 0 1902.4
8  55.202 6.0794 0.033335 0 1896.0
9  55.252 6.0675 0.033335 0 1889.8
10 55.301 6.0555 0.033335 0 1884.0

我想要做的是“合并”两个数据集,以便在ex文件中包含气候数据。 latlongex的值与latlongclim的值不同,因此我们无法直接合并(long)也是如此。 我需要找到最佳点(clim中距离ex中每一行的最近点latlong

示例的预期输出为:

  lat long      alt x      y
1  55    6 0.033335 0 1914.0
2  60    6 0.033335 0 1884.0
3  40   10 0.033335 0 1921.1

2 个答案:

答案 0 :(得分:3)

函数clim可用于计算矩阵或数据框中所有点之间的欧几里德(或其他)距离,因此可以找到ex中最接近于那些点的点的方法。 <{1}}来自

# Distance between all points in ex and clim combined,
# with distances between points in same matrix filtered out
n <- nrow(ex)
tmp <- as.matrix(dist(rbind(ex, clim[, 1:2])))[-(1:n), 1:n]

# Indices in clim corresponding to the closest points to those in ex
idx <- apply(tmp, 2, which.min)

# Points from ex with additional info from closest points in clim
cbind(ex, clim[idx, -(1:2)])
#>    lat long      alt x      y
#> 1   55    6 0.033335 0 1914.0
#> 10  60    6 0.033335 0 1884.0
#> 4   40   10 0.033335 0 1921.1

答案 1 :(得分:1)

您可以在clim中找到与lat的{​​{1}}和long的绝对差异最小的行索引,然后添加ex列基于该索引到clim

ex