我有一个大的坐标数据集,我想计算它们之间的欧几里德距离。样本就是这样的:
df <- data.frame("name" = c("a","b","c","a","e"), "type" = c("me","me","me","we", "we"),
"x" = c(64.044,63.722,64.359,65.373, 65.122),"y" = c(51.615,52.849,53.119,51.805,52.78),
"z" = c(33.423,32.671,31.662,34.158,35.26))
name type x y z a me 64.044 51.615 33.423 b me 63.722 52.849 32.671 c me 64.359 53.119 31.662 d we 65.373 51.805 34.158 e we 65.122 52.78 35.26
我想计算具有不同类型的名称列值的行中的欧几里德距离值。我写这段代码
require("distances")
for (i in 1:nrow(df)) {
if(!(df$type[i]%in%df$type[i+1])){
d <- distances(df[,3:5])
}
}
然而这是错误的。我的理想结果应如下所示,其中d表示名称值之间的距离:
a.me b.me c.me a.we e.we
a.me 0 0 0 d d
b.me 0 0 0 d d
c.me 0 0 0 d d
a.we d d d 0 0
e.we d d d 0 0
此外,我的数据集变化很大,我认为不是最好的选择。任何人都可以帮我解决这个问题吗?
答案 0 :(得分:3)
这是一个解决方案,但是根据数据的大小,可能需要大量的计算工作,因为首先计算整个dist-matrix,然后只计算所需的数据:
dist_mat <- dist(df[3:5], diag = TRUE, upper = TRUE)
dist_mat <- as.matrix(dist_mat)
names_vec <- paste(df$name, df$type, sep = "_")
dimnames(dist_mat) <- list(names_vec, names_vec)
dist_mat <-
sapply(colnames(dist_mat),
function(x) ifelse(grepl(strsplit(x, "_")[[1]][2],
row.names(dist_mat)), NA, dist_mat[,x]))
row.names(dist_mat) <- names_vec
dist_mat
# a_me b_me c_me d_we e_we
# a_me NA NA NA 1.530544 2.427731
# b_me NA NA NA 2.454976 2.944093
# c_me NA NA NA 2.997467 3.693602
# d_we 1.530544 2.454976 2.997467 NA NA
# e_we 2.427731 2.944093 3.693602 NA NA
答案 1 :(得分:1)
这就是诀窍:
df <- data.frame("name" = c("a","b","c","d","e"), "type" = c("me","me","me","we", "we"),
"x" = c(64.044,63.722,64.359,65.373, 65.122),"y" = c(51.615,52.849,53.119,51.805,52.78),
"z" = c(33.423,32.671,31.662,34.158,35.26))
# lapply over the levels of the type
x <- lapply(levels(df$type), function(level) {
# select the matching rows and columns and convert
mat <- as.matrix(df[as.character(df$type) == level, 3:5])
# names are set as row names for dist to use
row.names(mat) <- paste(df$name[as.character(df$type) == level],
level,
sep = "_")
# measuring the distance
dist <- dist(mat, method = "euclidean", diag = TRUE, upper = TRUE)
# converting distance to matrix
as.matrix(dist)
})
# bind the list to one matrix
x <- plyr::rbind.fill.matrix(x)
# add rownames
row.names(x) <- colnames(x)
x
a_me b_me c_me d_we e_we
a_me 0.000000 1.480522 2.337170 NA NA
b_me 1.480522 0.000000 1.223417 NA NA
c_me 2.337170 1.223417 0.000000 NA NA
d_we NA NA NA 0.000000 1.492659
e_we NA NA NA 1.492659 0.000000