我尝试使用此注释https://web.stanford.edu/class/stats202/content/lec9.pdf(第34页)
从头开始实施LDA为了检查我的实现,我将我的先验,组均值和线性判别系数与MASS库中的lda()
函数进行比较。我的先验和组意味着与lda()
生成的值匹配。但是,我的系数不同。这是玩具示例:
group1 = replicate(3, rnorm(10, mean = 1))
group2 = replicate(3, rnorm(15, mean = 2))
x = rbind(group1, group2)
colnames(x) = c(1, 2, 3)
y = matrix(rep(1, 10), ncol = 1)
y = rbind(y, matrix(rep(2, 15), ncol = 1))
colnames(y) = 'y'
library(MASS)
xy = cbind(x, y)
lda.fit = lda(y ~ ., as.data.frame(xy))
LDA <- function(x, y) {
group1_index = which( y == 1 )
group2_index = which( y == 2 )
#priors:
prior_group1 = length(group1_index) / length(y)
prior_group2 = length(group2_index) / length(y)
print("Prior probabilities of groups:")
print(c(prior_group1, prior_group2))
#means:
mean_group1 = colMeans(x[group1_index, ])
mean_group2 = colMeans(x[group2_index, ])
print("Group means:")
print(rbind(mean_group1, mean_group2))
#discriminant coefficients:
x[group1_index, ] = sweep(x[group1_index, ], 2, mean_group1, "-")
x[group2_index, ] = sweep(x[group2_index, ], 2, mean_group2, "-")
sigma = solve(cov(x))
disc_coeff = sigma %*% mean_group1
print("Coefficients of linear discriminants:")
print(disc_coeff)
}
LDA(x, y)
因此,我计算偏差的向量并估计协方差矩阵,如幻灯片中所述。以下是我的实施的输出和lda()
:
> LDA(x, y)
[1] "Prior probabilities of groups:"
[1] 0.4 0.6
[1] "Group means:"
1 2 3
mean_group1 0.9886488 0.7906502 1.228568
mean_group2 1.9180531 1.9603046 2.175800
[1] "Coefficients of linear discriminants:"
[,1]
1 2.449038
2 2.687901
3 3.108110
> lda.fit
Call:
lda(y ~ ., data = as.data.frame(xy))
Prior probabilities of groups:
1 2
0.4 0.6
Group means:
`1` `2` `3`
1 0.9886488 0.7906502 1.228568
2 1.9180531 1.9603046 2.175800
Coefficients of linear discriminants:
LD1
`1` 0.7204303
`2` 1.0612863
`3` 1.0114053
我试着查看lda()
函数的源代码,然而,对于新手R程序员来说这是非常复杂的。
有人可以建议是否应该以不同的方式计算线性判别系数?或者,lda()
可能进行了一些后期处理,我没有在实现中加入?