Question

所以我有这些数据，我想从它产生的等式中提取系数。这样我就可以插入一个新的数据点并查看它的放置位置。

library(MASS)
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
               Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
##  c  s  v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)

我知道我可以得到这个：

> z
Call:
lda(Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)

Prior probabilities of groups:
    c         s         v 
0.3333333 0.3333333 0.3333333 

Group means:
  Sepal.L. Sepal.W. Petal.L.  Petal.W.
c 5.969231 2.753846 4.311538 1.3384615
s 5.075000 3.541667 1.500000 0.2583333
v 6.700000 2.936000 5.552000 1.9880000

Coefficients of linear discriminants:
                LD1        LD2
Sepal.L. -0.5458866  0.5215937
Sepal.W. -1.5312824  1.7891248
Petal.L.  1.8087255 -1.2637188
Petal.W.  2.8620894  3.2868849

Proportion of trace:
   LD1    LD2 
0.9893 0.0107

但有没有办法得到方程，所以我不必手动计算新的观察结果？

Answer 1

把它变成一个答案。您需要predict()，MASS包中的predict.lda方法在其帮助页面中有您的确切示例：

tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
predict(z, test)$class

Answer 2

默认方法是“插件”，因此这是来自MASS:::predict.lda的代码。 object是fit-object，x来自转换为矩阵的newdata参数：

# snipped preamble and error checking
means <- colSums(prior * object$means)
scaling <- object$scaling
x <- scale(x, center = means, scale = FALSE) %*% scaling
dm <- scale(object$means, center = means, scale = FALSE) %*% 
    scaling
method <- match.arg(method)
dimen <- if (missing(dimen)) 
    length(object$svd)
else min(dimen, length(object$svd))
N <- object$N
if (method == "plug-in") {
    dm <- dm[, 1L:dimen, drop = FALSE]
    dist <- matrix(0.5 * rowSums(dm^2) - log(prior), nrow(x), 
        length(prior), byrow = TRUE) - x[, 1L:dimen, drop = FALSE] %*% 
        t(dm)
    dist <- exp(-(dist - apply(dist, 1L, min, na.rm = TRUE)))
}
@ snipped two other methods

}
posterior <- dist/drop(dist %*% rep(1, ng))

这主要是为了证明为什么Gregor的答案是最明智的方法。试图拉出“等式”似乎没有用。（我记得在毕业学校的第一年 - 回归课上使用线性回归的结果来做这样的练习。）

提取线性判别方程

2 个答案: