堪培拉距离矩阵手动计算

时间:2017-03-23 01:16:47

标签: r distance

canberra distance - inconsistent results类似,我编写了自己的距离计算,但我想为更多的数据集执行此操作,然后根据结果创建距离矩阵。

我的初始功能是

canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))

现在,我想将此函数应用于数据框中的每对行,然后根据此计算创建距离矩阵。我们说我的数据是:

data<-data.frame(replicate(500,sample(1:100,50,rep=TRUE)))

我正在努力研究下一部分,如何将其应用于每对行,然后创建一个基本上模仿的矩阵

dist(data,method="canberra")

我尝试过:

for (y in 1:50)
{
    for (z in 2:50)
    {
    canb.dist(data[y,1:500],data[z,1:500])
    }
}

但显然它没有。有没有办法遍历每一对并手动复制距离矩阵?

1 个答案:

答案 0 :(得分:2)

您可以使用combn创建行对并计算每对的堪培拉距离。然后转换为dist类,使用稀疏Matrix包将索引和值转换为矩阵

#OP's data
set.seed(1)
canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))
data <- data.frame(replicate(500,sample(1:100,50,rep=TRUE)))
refdist <- dist(data, method="canberra")

#convert to matrix
mat <- as.matrix(data)

#sequence of row indices
rowidx <- seq_len(nrow(mat))

#calculate OP's Canberra dist for each pair of rows
triangular <- combn(rowidx, 2, function(x) c(x[1], x[2], canb.dist(mat[x[1],], mat[x[2],])))

#construct the matrix given the indices and values using Matrix library,
#convert into a matrix before converting into a dist class
#the values refer to the diagonal, lower triangular and upper triangular
library(Matrix)
ansdist <- as.dist(as.matrix(sparseMatrix(
    i=c(rowidx, triangular[1,], triangular[2,]), 
    j=c(rowidx, triangular[2,], triangular[1,]),
    x=c(rep(0, length(rowidx)), triangular[3,], triangular[3,])
)))

#idea from http://stackoverflow.com/questions/17375056/r-sparse-matrix-conversion/17375747#17375747
range(as.matrix(refdist) - as.matrix(ansdist))