I have two data frames of different sizes containing geocodes. The first (df
) has 12,000 observations and the second (schools) 3,000.
The first contains geocodes for properties in a country and the second for schools in the country.
I want to find the distance of the nearest school for each property. Using the geosphere
package I'm currently working with the following:
library(geosphere)
for(i in 1:length(df$longitude)){
df$dist2[i] <- distm(c(schools[1, 3], schools[1, 2]), c(df$longitude[i], df$latitude[i]), fun = distHaversine) *0.001
}
where schools[, 3]
and schools[, 2]
are the longitude and latitude columns of that data frame respectively.
The above calculates the distance (in km) between all observations in df
and the first school in schools
.
I want to calculate the distance between each observation and all schools
, saving only the smallest distance as that value for df$dist2[i]
.
答案 0 :(得分:1)
在以下示例中,我组成了点和学校的经度/纬度数据。
library(tidyverse)
library(geosphere)
df_points <- data.frame(lon = rnorm(10, mean =4, sd = 0.5), lat = rnorm(10, mean = 50, sd= 0.1))
df_schools <- data.frame(lon = rnorm( 3, mean =4, sd = 0.5), lat = rnorm( 3, mean = 50, sd= 0.1))
distm(df_points, df_schools, fun = distHaversine ) %>%
as.data.frame() %>%
rownames_to_column(var = "point_id") %>%
mutate(point_id = as.numeric(point_id)) %>%
gather(key = school, value = distance, -point_id) %>%
group_by(point_id) %>%
summarise(smalles_distance = min(distance))
答案 1 :(得分:1)
这里是使用sp类对象的方法。您可以使用类似coordinates(x) <- ~lon+lat
的方式将data.frame对象强制转换为SpatialPointsDataFrame对象:这里的想法是派生两个点要素类之间的距离矩阵,然后根据列名称提取距离和ID(从学校数据分配)。这不仅会返回距离,而且还会返回每所学校的唯一标识符,从而可以轻松查询与任何给定属性特征最接近的实际学校。
首先,添加所需的库并创建一些示例数据。
library(sp)
library(raster)
e <- as(raster::extent(-180, 180, -90, 90), "SpatialPolygons")
properties <- spsample(e, 1000, type="random")
proj4string(properties) <- "+proj=longlat +ellps=WGS84"
schools <- spsample(e, 100, type="random")
proj4string(schools) <- "+proj=longlat +ellps=WGS84"
schools$ids <- paste0("school", 1:length(schools))
现在,我们可以创建距离矩阵,将对角线分配给NA,并将学校的唯一标识符添加到矩阵的列名。
d <- spDists(x = properties, y = schools, longlat = TRUE)
diag(d) <- NA
colnames(d) <- schools$ids
当然有更优雅的方法可以做到这一点,但是为了简单起见,我们将使用for循环来填充代表距离和ID的两个向量。我们使用which.min
将索引拉到第i行的最小距离。迭代器基于矩阵行,因为它们代表属性特征。
sdist <- rep(NA, nrow(d))
sid <- rep(NA, nrow(d))
for(i in 1:nrow(d)) {
srow <- d[i,]
sdist[i] <- srow[which.min(srow)]
sid[i] <- names(srow)[which.min(srow)]
}
然后,我们可以将结果矢量分配给属性SpatialPointsDataFrame。现在,@ data插槽data.frame中有列,它们代表到最近学校的距离以及学校ID。
properties$school <- sid
properties$dist <- sdist
在这里我们可以绘制结果。
par(mfrow=c(2,1))
plot(properties, pch=19, cex=0.5)
plot(schools, pch=19, col="red", add=TRUE)
plot(e, add=TRUE)
title("random properties (black) and schools (red)", cex=0.5)
plot(properties, col="white")
plot(properties[1,], pch=19, cex=2, add=TRUE)
plot(schools[which(schools$ids %in% properties[1,]$school),],
pch=19, cex=2, col="red", add=TRUE)
plot(e, add=TRUE)
title("Property 1 (black) and closest school (red)", cex=0.5)
sidx <- which(schools$ids %in% properties[1,]$school)
text(coordinates(schools[sidx,]),
label = schools[sidx,]$ids, col="blue", cex=1)