我的DF有大约100万条记录。每个记录包括纬度和经度,记录按示例数据分组(除了更大的组)
data.frame(Latitude=c(-30.25,-30.89,-30.48,-30.10),
Longitude=c(116.321,116.98,116.78,116.38),
grp=c('a','a','b','b'))
在每组中,我需要找到任意两组坐标之间的最大距离。一旦我在DF中拥有坐标的所有组合,我可以计算距离没有问题,但不能有效地将每个组合变成看起来像这样的DF
data.frame(Latitude1=c(-30.25,-30.25,-30.89,-30.89,-30.48,-30.48,-30.10,-30.10),
Longitude1=c(116.321,116.32,116.98,116.98,116.78,116.78,116.38,116.38),
Latitude2=c(-30.25,-30.89,-30.25,-30.89,-30.48,-30.10,-30.48,-30.10),
Longitude2=c(116.321,116.98,116.98,116.321,116.78,116.38,116.38,116.78),
grp=c('a','a','a','a','b','b','b','b'))
我已经编写了一个嵌套循环来执行此操作,但它非常慢,而且我确定有更好的方法。 我看了重复列和使用expand.grid,但可以找到如何使用多个因素 任何帮助,将不胜感激。感谢
答案 0 :(得分:2)
这样的事情怎么样才能让你开始。我们使用geosphere::distm
来计算距离(此处为测地距离):
根据data.frame
grp
分组
lst <- split(df, df$grp)
计算测地距离
library(geosphere);
dist <- lapply(lst, function(x) distm(x[, c("Longitude", "Latitude")]));
结果是list
对称距离矩阵,其中行/列对应于记录。
dist;
#$a
# [,1] [,2]
#[1,] 0.00 95029.27
#[2,] 95029.27 0.00
#
#$b
# [,1] [,2]
#[1,] 0.00 57056.28
#[2,] 57056.28 0.00
然后,您可以根据每组的最小距离来过滤记录。你每组只给2分,所以提取最大距离是微不足道的,因为只有一个。
答案 1 :(得分:1)
如果您习惯使用开发/未发布的软件包,我已编写spatialdatatable
来对data.table
个对象进行有效的地理*计算。
这是一个处理100,000行数据的解决方案。步骤是
library(data.table)
# devtools::install_github("SymbolixAU/spatialdatatable")
library(spatialdatatable)
## generate random data
lons <- sample(0:180, 1e5, replace = T)
lats <- sample(-90:1, 1e5, replace = T)
grp <- sample(letters, 1e5, replace = T)
df <- data.frame(lon = lons, lat = lats, grp = grp)
## set as a data.table object, and assign an 'id' to each point
setDT(df)
df[, id := .I]
## 1. join the df to itself to give all points to all other points
df <- df[
df
, on = "grp"
, nomatch = 0
, allow.cartesian = T
][id != i.id] ## remove points joined with themselves
## 2. calculate distances
df[, dist := spatialdatatable::dtHaversine(lat, lon, i.lat, i.lon)]
## 3. select greatest distance per group
df[ df[, .I[which.max(dist)], by = grp]$V1 ][order(grp)]
# lon lat grp id i.lon i.lat i.id dist
# 1: 1 0 a 27726 180 0 10996 19903920
# 2: 1 1 b 63425 180 -3 57218 19766508
# 3: 1 1 c 18255 177 -2 56 19556799
# 4: 0 -1 d 43560 179 0 8518 19857865
# 5: 178 -2 e 37485 0 0 34482 19700640
# 6: 1 -2 f 79879 180 1 70765 19857889
# 7: 178 1 g 84268 1 -3 44148 19614379
# 8: 178 -5 h 49310 1 1 1306 19459455
# 9: 0 1 i 92786 179 -2 55584 19857889
# 10: 180 0 j 92704 0 0 36757 20015115
# 11: 0 -1 k 75760 180 0 71050 19903920
# 12: 0 -1 l 42202 180 0 10839 19903920
# 13: 0 1 m 73069 177 -2 2708 19663598
# 14: 0 1 n 10830 180 -1 1236 20015115
# 15: 3 -2 o 43380 180 1 3829 19663598
# 16: 179 1 p 95740 0 -1 3061 19903937
# 17: 0 -1 q 49476 180 0 18257 19903920
# 18: 180 0 r 96154 1 0 42435 19903920
# 19: 180 -1 s 82115 1 0 47784 19857865
# 20: 178 -2 t 42861 0 0 22020 19700640
# 21: 180 0 u 22965 0 -1 12158 19903920
# 22: 178 0 v 18557 0 -2 17457 19700640
# 23: 178 -2 w 58321 1 -1 13906 19543390
# 24: 0 -1 x 93181 177 -3 67084 19459211
# 25: 0 -1 y 46491 178 1 5548 19792759
# 26: 3 1 z 43109 180 -3 769 19614379
library(geosphere)