所以我有这些数据,我想从它产生的等式中提取系数。这样我就可以插入一个新的数据点并查看它的放置位置。
library(MASS)
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
## c s v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
我知道我可以得到这个:
> z
Call:
lda(Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)
Prior probabilities of groups:
c s v
0.3333333 0.3333333 0.3333333
Group means:
Sepal.L. Sepal.W. Petal.L. Petal.W.
c 5.969231 2.753846 4.311538 1.3384615
s 5.075000 3.541667 1.500000 0.2583333
v 6.700000 2.936000 5.552000 1.9880000
Coefficients of linear discriminants:
LD1 LD2
Sepal.L. -0.5458866 0.5215937
Sepal.W. -1.5312824 1.7891248
Petal.L. 1.8087255 -1.2637188
Petal.W. 2.8620894 3.2868849
Proportion of trace:
LD1 LD2
0.9893 0.0107
但有没有办法得到方程,所以我不必手动计算新的观察结果?
答案 0 :(得分:1)
把它变成一个答案。您需要predict()
,MASS包中的predict.lda
方法在其帮助页面中有您的确切示例:
tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
predict(z, test)$class
答案 1 :(得分:1)
默认方法是“插件”,因此这是来自MASS:::predict.lda
的代码。 object
是fit-object,x
来自转换为矩阵的newdata
参数:
# snipped preamble and error checking
means <- colSums(prior * object$means)
scaling <- object$scaling
x <- scale(x, center = means, scale = FALSE) %*% scaling
dm <- scale(object$means, center = means, scale = FALSE) %*%
scaling
method <- match.arg(method)
dimen <- if (missing(dimen))
length(object$svd)
else min(dimen, length(object$svd))
N <- object$N
if (method == "plug-in") {
dm <- dm[, 1L:dimen, drop = FALSE]
dist <- matrix(0.5 * rowSums(dm^2) - log(prior), nrow(x),
length(prior), byrow = TRUE) - x[, 1L:dimen, drop = FALSE] %*%
t(dm)
dist <- exp(-(dist - apply(dist, 1L, min, na.rm = TRUE)))
}
@ snipped two other methods
}
posterior <- dist/drop(dist %*% rep(1, ng))
这主要是为了证明为什么Gregor的答案是最明智的方法。试图拉出“等式”似乎没有用。 (我记得在毕业学校的第一年 - 回归课上使用线性回归的结果来做这样的练习。)