Question

I have two data frames of different sizes containing geocodes. The first (df) has 12,000 observations and the second (schools) 3,000.

The first contains geocodes for properties in a country and the second for schools in the country.

I want to find the distance of the nearest school for each property. Using the geosphere package I'm currently working with the following:

library(geosphere)
for(i in 1:length(df$longitude)){
  df$dist2[i] <- distm(c(schools[1, 3], schools[1, 2]), c(df$longitude[i], df$latitude[i]), fun = distHaversine)  *0.001
}

where schools[, 3] and schools[, 2] are the longitude and latitude columns of that data frame respectively.

The above calculates the distance (in km) between all observations in df and the first school in schools.

I want to calculate the distance between each observation and all schools, saving only the smallest distance as that value for df$dist2[i].

Answer 1

在以下示例中，我组成了点和学校的经度/纬度数据。

library(tidyverse)
library(geosphere)

df_points  <- data.frame(lon = rnorm(10, mean =4, sd = 0.5), lat = rnorm(10, mean = 50, sd= 0.1))
df_schools <- data.frame(lon = rnorm( 3, mean =4, sd = 0.5), lat = rnorm( 3, mean = 50, sd= 0.1))

distm(df_points, df_schools, fun = distHaversine ) %>% 
  as.data.frame() %>% 
  rownames_to_column(var = "point_id") %>% 
  mutate(point_id = as.numeric(point_id)) %>%
  gather(key = school, value = distance, -point_id) %>%
  group_by(point_id) %>% 
  summarise(smalles_distance = min(distance))

Answer 2

这里是使用sp类对象的方法。您可以使用类似coordinates(x) <- ~lon+lat的方式将data.frame对象强制转换为SpatialPointsDataFrame对象：这里的想法是派生两个点要素类之间的距离矩阵，然后根据列名称提取距离和ID（从学校数据分配）。这不仅会返回距离，而且还会返回每所学校的唯一标识符，从而可以轻松查询与任何给定属性特征最接近的实际学校。

首先，添加所需的库并创建一些示例数据。

library(sp)
library(raster)

e <- as(raster::extent(-180, 180, -90, 90), "SpatialPolygons")
  properties <- spsample(e, 1000, type="random")
    proj4string(properties) <- "+proj=longlat +ellps=WGS84"  
  schools <- spsample(e, 100, type="random")
    proj4string(schools) <- "+proj=longlat +ellps=WGS84"
      schools$ids <- paste0("school", 1:length(schools))

现在，我们可以创建距离矩阵，将对角线分配给NA，并将学校的唯一标识符添加到矩阵的列名。

d <- spDists(x = properties, y = schools, longlat = TRUE)
  diag(d) <- NA
    colnames(d) <- schools$ids

当然有更优雅的方法可以做到这一点，但是为了简单起见，我们将使用for循环来填充代表距离和ID的两个向量。我们使用which.min将索引拉到第i行的最小距离。迭代器基于矩阵行，因为它们代表属性特征。

sdist <- rep(NA, nrow(d))
sid <- rep(NA, nrow(d))
  for(i in 1:nrow(d)) {
    srow <- d[i,]  
    sdist[i] <- srow[which.min(srow)]
    sid[i] <- names(srow)[which.min(srow)]
  }

然后，我们可以将结果矢量分配给属性SpatialPointsDataFrame。现在，@ data插槽data.frame中有列，它们代表到最近学校的距离以及学校ID。

properties$school <- sid
properties$dist <- sdist

在这里我们可以绘制结果。

par(mfrow=c(2,1))  
  plot(properties, pch=19, cex=0.5)
    plot(schools, pch=19, col="red", add=TRUE)
      plot(e, add=TRUE)
        title("random properties (black) and schools (red)", cex=0.5) 
  plot(properties, col="white")
    plot(properties[1,], pch=19, cex=2, add=TRUE)
      plot(schools[which(schools$ids %in% properties[1,]$school),], 
         pch=19, cex=2, col="red", add=TRUE)
           plot(e, add=TRUE)
           title("Property 1 (black) and closest school (red)", cex=0.5)
     sidx <- which(schools$ids %in% properties[1,]$school)
       text(coordinates(schools[sidx,]), 
            label = schools[sidx,]$ids, col="blue", cex=1)

Minimum distance between two sets of co-ordinates

2 个答案: