与canberra distance - inconsistent results类似,我编写了自己的距离计算,但我想为更多的数据集执行此操作,然后根据结果创建距离矩阵。
我的初始功能是
canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))
现在,我想将此函数应用于数据框中的每对行,然后根据此计算创建距离矩阵。我们说我的数据是:
data<-data.frame(replicate(500,sample(1:100,50,rep=TRUE)))
我正在努力研究下一部分,如何将其应用于每对行,然后创建一个基本上模仿的矩阵
dist(data,method="canberra")
我尝试过:
for (y in 1:50)
{
for (z in 2:50)
{
canb.dist(data[y,1:500],data[z,1:500])
}
}
但显然它没有。有没有办法遍历每一对并手动复制距离矩阵?
答案 0 :(得分:2)
您可以使用combn
创建行对并计算每对的堪培拉距离。然后转换为dist
类,使用稀疏Matrix
包将索引和值转换为矩阵
#OP's data
set.seed(1)
canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))
data <- data.frame(replicate(500,sample(1:100,50,rep=TRUE)))
refdist <- dist(data, method="canberra")
#convert to matrix
mat <- as.matrix(data)
#sequence of row indices
rowidx <- seq_len(nrow(mat))
#calculate OP's Canberra dist for each pair of rows
triangular <- combn(rowidx, 2, function(x) c(x[1], x[2], canb.dist(mat[x[1],], mat[x[2],])))
#construct the matrix given the indices and values using Matrix library,
#convert into a matrix before converting into a dist class
#the values refer to the diagonal, lower triangular and upper triangular
library(Matrix)
ansdist <- as.dist(as.matrix(sparseMatrix(
i=c(rowidx, triangular[1,], triangular[2,]),
j=c(rowidx, triangular[2,], triangular[1,]),
x=c(rep(0, length(rowidx)), triangular[3,], triangular[3,])
)))
#idea from http://stackoverflow.com/questions/17375056/r-sparse-matrix-conversion/17375747#17375747
range(as.matrix(refdist) - as.matrix(ansdist))