如何为lat-lon观测指定几个名称

时间:2014-02-23 17:35:11

标签: r distance geographic-distance

我有两个数据帧:df1包含带有lat-lon坐标的观测值; df2具有lat-lon坐标的名称。我想创建一个新的变量df1$names,每个变量都有df2的名称,这些名称与该观察值之间的距离。

df1的一些示例数据:

df1 <- structure(list(lat = c(52.768, 53.155, 53.238, 53.253, 53.312, 53.21, 53.21, 53.109, 53.376, 53.317, 52.972, 53.337, 53.208, 53.278, 53.316, 53.288, 53.341, 52.945, 53.317, 53.249), lon = c(6.873, 6.82, 6.81, 6.82, 6.84, 6.748, 6.743, 6.855, 6.742, 6.808, 6.588, 6.743, 6.752, 6.845, 6.638, 6.872, 6.713, 6.57, 6.735, 6.917), cat = c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 2L), diff = c(6.97305555555555, 3.39815972222222, 14.2874305555556, -0.759791666666667, 34.448275462963, 4.38783564814815, 0.142430555555556, 0.698599537037037, 1.22914351851852, 7.0008912037037, 1.3349537037037, 8.67978009259259, 1.6090162037037,    25.9466782407407, 9.45068287037037, 4.76284722222222, 1.79163194444444, 16.8280787037037, 1.01336805555556, 3.51240740740741)), .Names = c("lat", "lon", "cat", "diff"), row.names = c(125L, 705L, 435L, 682L, 186L, 783L, 250L, 517L, 547L, 369L, 618L, 280L, 839L, 614L, 371L, 786L, 542L, 100L, 667L, 785L), class = "data.frame")

df2的一些示例数据:

df2 <- structure(list(latlonloc = structure(c(6L, 3L, 4L, 2L, 5L, 1L), .Label = c("Boelenslaan", "Borgercompagnie", "Froombosch", "Garrelsweer", "Stitswerd", "Tinallinge"), class = "factor"), lat = c(53.356789, 53.193886, 53.311237, 53.111339, 53.360848, 53.162031), lon = c(6.53493, 6.780792, 6.768608, 6.82354, 6.599604, 6.143804)), .Names = c("latlonloc", "lat", "lon"), class = "data.frame", row.names = c(NA, -6L))

使用geosphere包创建距离矩阵:

library(geosphere)
mat <- distm(df1[,c('lon','lat')], df2[,c('lon','lat')], fun=distHaversine)

结果距离以米为单位(至少我认为它们是,否则距离矩阵有问题)。

指定的距离是使用(df1$cat)^2)*1000计算的。我尝试了df1$names <- df2$latlonloc[apply(distmat, 1, which(distmat < ((df1$cat)^2)*1000 ))],但收到了错误消息:

Error in match.fun(FUN) : 
  'which(distmat < ((df1$cat)^2) * 1000)' is not a function, character or symbol

这可能不是正确的appraoch,但我需要的是:

df1$names <- #code or function which gives me a string of names which are within a specified distance of the observation

如何创建一个名称在观察指定距离内的字符串?

1 个答案:

答案 0 :(得分:1)

您需要对df1(或mat)的每一行进行操作,以便确定每行df2中每个对象的距离。由此,您可以选择符合距离标准的那些。

我认为您对apply的使用以及which的使用感到有些困惑。要让which真正适合您,您需要将其应用于mat的每一行,而您当前的代码会将其应用于整个mat矩阵。另请注意,此处很难使用apply,因为您要将mat的每一行与((df1$cat)^2)*1000)定义的向量的相应元素进行比较。因此,我将使用sapplylapply向您展示示例。您也可以在此处使用mapply,但我认为sapply / mapply语法更清晰。

为了解决您想要的输出问题,我展示了两个例子。一个列表返回一个列表,其中包含df1中每一行中df2中距离阈值范围内的项目的名称。这不会轻易地作为变量返回到原始df1,因为列表中的每个元素都可以包含多个名称。第二个示例将这些名称粘贴在一起作为单个逗号分隔的字符串,以便创建您正在寻找的新变量。

示例1:

out1 <- lapply(1:nrow(df1), function(x) {
    df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc']
})

结果:

> str(out1)
List of 20
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 2
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 4
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 6 4 5
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 4
 $ : Factor w/ 6 levels "Boelenslaan",..: 

示例2:

out2 <- sapply(1:nrow(df1), function(x) {
    paste(df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc'], collapse=',')
})

结果:

> out2
 [1] ""                                 ""                                
 [3] ""                                 ""                                
 [5] ""                                 ""                                
 [7] ""                                 "Borgercompagnie"                 
 [9] ""                                 "Garrelsweer"                     
[11] ""                                 ""                                
[13] ""                                 ""                                
[15] "Tinallinge,Garrelsweer,Stitswerd" ""                                
[17] ""                                 ""                                
[19] "Garrelsweer"                      ""

我认为其中第二个可能与你的目标最接近。