R中的马哈拉诺比斯距离

时间:2013-09-06 13:25:20

标签: r distance

我在包StatMatch(http://cran.r-project.org/web/packages/StatMatch/StatMatch.pdf)中找到了mahalanobis.dist函数,但它并没有完全符合我的要求。它似乎是计算数据中每次观察的马哈拉诺比斯距离。对于data.x中的每次观察都是如此。

我想计算data.y中一个观测值的mahalanobis距离到data.x中的所有观测值。如果有意义的话,基本上计算一个点的马哈拉诺比斯距离到点的“云”。有必要了解观察成为另一组观察的一部分的概率

这个人(http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.html)似乎正在这样做,我试图在R中复制他的过程,但是当我到达等式的底部时它失败了:

mahaldist = sqrt((inversepooledcov %*% t(meandiffmatrix)) %*% meandiffmatrix)

我正在使用的所有代码都在这里:

a = rbind(c(2,2), c(2,5), c(6,5),c(7,3))

colnames(a) = c('x', 'y')

b = rbind(c(6,5),c(3,4))

colnames(b) = c('x', 'y')

acov = cov(a)
bcov = cov(b)

meandiff1 = mean(a[,1]) - mean(b[,1])

meandiff2 = mean(a[,2]) - mean(b[,2])

meandiffmatrix = rbind(c(meandiff1,meandiff2))

totaldata = dim(a)[1] + dim(b)[1]

pooledcov = (dim(a)[1]/totaldata * acov) + (dim(b)[1]/totaldata * bcov)

inversepooledcov = solve(pooledcov)

mahaldist = sqrt((inversepooledcov %*% t(meandiffmatrix)) %*% meandiffmatrix)

7 个答案:

答案 0 :(得分:6)

如何使用mahalanobis包中的stats函数:

 mahalanobis(x, center, cov, inverted = FALSE, ...)

答案 1 :(得分:5)

我一直在你所看到的同一个网站上尝试这个,然后偶然发现了这个问题。我设法让脚本工作,但我得到了不同的结果。

#WORKING EXAMPLE
#MAHALANOBIS DIST OF TWO MATRICES

#define matrix
mat1<-matrix(data=c(2,2,6,7,4,6,5,4,2,1,2,5,5,3,7,4,3,6,5,3),nrow=10)
mat2<-matrix(data=c(6,7,8,5,5,5,4,7,6,4),nrow=5)
#center data
mat1.1<-scale(mat1,center=T,scale=F)
mat2.1<-scale(mat2,center=T,scale=F)
#cov matrix
mat1.2<-cov(mat1.1,method="pearson")
mat2.2<-cov(mat2.1,method="pearson")
n1<-nrow(mat1)
n2<-nrow(mat2)
n3<-n1+n2
#pooled matrix
mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)
#inverse pooled matrix
mat4<-solve(mat3)
#mean diff
mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
#multiply
mat6<-t(mat5) %*% mat4
#multiply
sqrt(mat6 %*% mat5)

我认为函数mahalanobis()用于计算一个矩阵中个体(行)之间的马哈拉诺比斯距离。来自pairwise.mahalanobis()的函数package(HDMD)可以比较两个或多个矩阵,并在矩阵之间给出马哈拉诺比斯距离。

答案 2 :(得分:1)

取平方根之前的输出是:

inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix
          [,1]        [,2]
x -0.004349227 -0.01304768
y  0.114529639  0.34358892

我认为你可以“使用负数字的平方根,所以你有NAN的负面元素:

 sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix)
       [,1]      [,2]
x       NaN       NaN
y 0.3384223 0.5861646

Warning message:
In sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix) :
  NaNs produced

答案 3 :(得分:1)

您可以将函数stats::mahalanobis包含在下面以输出马哈拉诺比斯距离矩阵(成对马哈拉诺比斯距离):

# x - data frame
# cx - covariance matrix; if not provided, 
#      it will be estimated from the data
mah <- function(x, cx = NULL) {
  if(is.null(cx)) cx <- cov(x)
  out <- lapply(1:nrow(x), function(i) {
    mahalanobis(x = x, 
                center = do.call("c", x[i, ]),
                cov = cx)
  })
  return(as.dist(do.call("rbind", out)))
}

然后,您可以对数据进行聚类并绘制它,例如:

# Dummy data
x <- data.frame(X = c(rnorm(10, 0), rnorm(10, 5)), 
                Y = c(rnorm(10, 0), rnorm(10, 7)), 
                Z = c(rnorm(10, 0), rnorm(10, 12)))
rownames(x) <- LETTERS[1:20]
plot(x, pch = LETTERS[1:20])

enter image description here

# Comute the mahalanobis distance matrix
d <- mah(x)
d

# Cluster and plot
hc <- hclust(d)
plot(hc)

enter image description here

答案 4 :(得分:0)

使用R Package&#34; biotools&#34;有一种非常简单的方法。在这种情况下,你将得到一个平方距离马哈拉诺比斯矩阵。

#Manly (2004, p.65-66)

x1 <- c(131.37, 132.37, 134.47, 135.50, 136.17)
x2 <- c(133.60, 132.70, 133.80, 132.30, 130.33)
x3 <- c(99.17, 99.07, 96.03, 94.53, 93.50)
x4 <- c(50.53, 50.23, 50.57, 51.97, 51.37)

#size (n x p) #Means 
x <- cbind(x1, x2, x3, x4) 

#size (p x p) #Variances and Covariances
Cov <- matrix(c(21.112,0.038,0.078,2.01, 0.038,23.486,5.2,2.844, 
        0.078,5.2,24.18,1.134, 2.01,2.844,1.134,10.154), 4, 4)

library(biotools)
Mahalanobis_Distance<-D2.dist(x, Cov)
print(Mahalanobis_Distance)

答案 5 :(得分:0)

如果协方差矩阵是恒等式,则马哈拉诺比斯距离等于(平方)欧几里得距离。如果变量之间具有协方差,则可以通过先对矩阵加白以消除协方差,使Mahalanobis和sq Euclidean相等。即:

#X is your matrix
if (!require("whitening")) install.packages("whitening")

X <- whitening::whiten(X) # default is ZCA (Mahalanobis) whitening
X_dist <- dist(X, diag = T, method = "euclidean")^2

您可以确认这为您提供与Davit在先前答案之一中提供的代码相同的距离矩阵。

答案 6 :(得分:0)

您现在可以通过 metan 包计算马哈拉诺比斯距离。参考函数 mahala()mahala_design()Package documet