我有一个人的地理坐标数据集,如下所示:
Person Latitude Longitude
1 46.0614 -23.9386
2 48.1792 63.1136
3 59.9289 66.3883
4 42.8167 58.3167
5 43.1167 63.25
我计划使用R中的geosphere包计算二元级别的地理邻近度。为了实现这一点,我需要创建一个如下所示的数据集:
Person1 Person2 LatitudeP1 LongitudeP1 LatitudeP2 LongitudeP2
1 2 46.0614 -23.9386 48.1792 63.1136
1 3 46.0614 -23.9386 59.9289 66.3883
1 4 46.0614 -23.9386 42.8167 58.3167
1 5 46.0614 -23.9386 43.1167 63.25
2 3 48.1792 63.1136 59.9289 66.3883
2 4 48.1792 63.1136 42.8167 58.3167
2 5 48.1792 63.1136 43.1167 63.25
3 4 59.9289 66.3883 42.8167 58.3167
3 5 59.9289 66.3883 43.1167 63.25
4 5 42.8167 58.3167 43.1167 63.25
因此,结果数据对于数据集中的每个可能的二元组都有一行,并且包括二元组中两个个体的坐标。 “LatitudeP1”和“LongitudeP1”是二元组中“Person1”的坐标,“LatitudeP2”和“LongitudeP2”是二元组中“Person2”的坐标。此外,将哪个ID列为Person1与Person2并不重要,因为地理距离不是定向关系。
答案 0 :(得分:2)
只需采用combn
1到5的可能组合(Person
),并从原始数据中对Lat / long进行子集化:
dat <- read.table(header = TRUE, text="Person Latitude Longitude
1 46.0614 -23.9386
2 48.1792 63.1136
3 59.9289 66.3883
4 42.8167 58.3167
5 43.1167 63.25")
tmp <- t(combn(nrow(dat),2))
# [,1] [,2]
# [1,] 1 2
# [2,] 1 3
# [3,] 1 4
# [4,] 1 5
# [5,] 2 3
# [6,] 2 4
# [7,] 2 5
# [8,] 3 4
# [9,] 3 5
# [10,] 4 5
res <- cbind(tmp,
do.call('cbind', lapply(1:2, function(x)
mapply(`[`, dat[, 2:3], MoreArgs = list(i=tmp[, x])))))
colnames(res) <- c('Person1','Person2','LatitudeP1','LongitudeP1',
'LatitudeP2','LongitudeP2')
data.frame(res)
# Person1 Person2 LatitudeP1 LongitudeP1 LatitudeP2 LongitudeP2
# 1 1 2 46.0614 -23.9386 48.1792 63.1136
# 2 1 3 46.0614 -23.9386 59.9289 66.3883
# 3 1 4 46.0614 -23.9386 42.8167 58.3167
# 4 1 5 46.0614 -23.9386 43.1167 63.2500
# 5 2 3 48.1792 63.1136 59.9289 66.3883
# 6 2 4 48.1792 63.1136 42.8167 58.3167
# 7 2 5 48.1792 63.1136 43.1167 63.2500
# 8 3 4 59.9289 66.3883 42.8167 58.3167
# 9 3 5 59.9289 66.3883 43.1167 63.2500
# 10 4 5 42.8167 58.3167 43.1167 63.2500
答案 1 :(得分:1)
如果你想要成对距离,并且你正在使用包geosphere
,为什么不使用distm(...)
而不是跳过所有这些火热的箍:
# df is the dataset from your question
library(geosphere)
distm(df[,3:2],fun=distHaversine) # distance in *meters*
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 6224407.2 5743824 6243068.1 6553157.4
# [2,] 6224407 0.0 1324950 704260.1 563654.6
# [3,] 5743824 1324949.8 0 1982326.1 1883584.1
# [4,] 6243068 704260.1 1982326 0.0 403183.0
# [5,] 6553157 563654.6 1883584 403183.0 0.0
您也可以使用fossil
包。
library(fossil)
earth.dist(df[,3:2],dist=FALSE) # distance in *kilometers*
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.000 6219.1967 5739.016 6237.8420 6547.6718
# [2,] 6219.197 0.0000 1323.841 703.6706 563.1828
# [3,] 5739.016 1323.8407 0.000 1980.6667 1882.0073
# [4,] 6237.842 703.6706 1980.667 0.0000 402.8455
# [5,] 6547.672 563.1828 1882.007 402.8455 0.0000
请注意,这些函数需要经度,然后是纬度,所以你必须传递cols 3:2,而不是2:3。
编辑对OP评论的回应。
“边缘列表”听起来像是想要以igraph
对象结束。您可以使用距离矩阵作为igraph
中的邻接矩阵,距离将自动填充边缘列表上的权重。
library(igraph)
library(geosphere)
g <- graph.adjacency(distm(df[,3:2],fun=distHaversine),
mode="undirected",weighted=TRUE)
set.seed(1) # for reproducible plot
plot(g, layout=layout.fruchterman.reingold(g,weights=E(g)$weight))
get.data.frame(g,"edges")
# from to weight
# 1 1 2 6224407.2
# 2 1 3 5743824.5
# 3 1 4 6243068.1
# 4 1 5 6553157.4
# 5 2 3 1324949.8
# 6 2 4 704260.1
# 7 2 5 563654.6
# 8 3 4 1982326.1
# 9 3 5 1883584.1
# 10 4 5 403183.0