Question

我对R作品中 predict.glm 的功能感到困惑。根据帮助，

“terms”选项返回一个矩阵，给出线性预测器标度上模型公式中每个项的拟合值。

因此，如果我的模型的形式为f（y）= X * beta，那么命令

function( std::nullptr_t )

预计

会产生相同的矩阵X，乘以β元素。例如，如果我训练以下模型

predict(model, X, type='terms')

结果系数

test.data = data.frame(y = c(0,0,0,1,1,1,1,1,1), x=c(1,2,3,1,2,2,3,3,3))
model = glm(y~(x==1)+(x==2), family = 'binomial', data = test.data)

设计矩阵

beta <- model$coef

然后乘以它应该看起来像

的系数

X <- model.matrix(y~(x==1)+(x==2), data = test.data)

  (Intercept) x == 1TRUE x == 2TRUE
1           1          1          0
2           1          0          1
3           1          0          0
4           1          1          0
5           1          0          1
6           1          0          1
7           1          0          0
8           1          0          0
9           1          0          0

然而，由pred1 <- t(beta * t(X)) (Intercept) x == 1TRUE x == 2TRUE 1 1.098612 -1.098612 0.0000000 2 1.098612 0.000000 -0.4054651 3 1.098612 0.000000 0.0000000 4 1.098612 -1.098612 0.0000000 5 1.098612 0.000000 -0.4054651 6 1.098612 0.000000 -0.4054651 7 1.098612 0.000000 0.0000000 8 1.098612 0.000000 0.0000000 9 1.098612 0.000000 0.0000000产生的实际矩阵似乎与此无关。以下代码

predict.glm

如何解释这样的结果？

Answer 1

我已经编辑了你的问题，包括＆＃34;更正＆＃34;获取（原始）模型矩阵，模型系数和预期的逐项预测的方法。所以你关于如何获得这些的另一个问题已经解决了。在下文中，我将帮助您理解predict.glm()。

predict.glm()（实际上，predict.lm()）在进行逐项预测时为每个模型术语应用了居中约束。

最初，你有一个模型矩阵

X <- model.matrix(y~(x==1)+(x==2), data = test.data)

但它是居中的，通过删除列意味着：

avx <- colMeans(X)
X1 <- sweep(X, 2L, avx)

> avx
(Intercept)  x == 1TRUE  x == 2TRUE 
  1.0000000   0.2222222   0.3333333 

> X1
  (Intercept) x == 1TRUE x == 2TRUE
1           0  0.7777778 -0.3333333
2           0 -0.2222222  0.6666667
3           0 -0.2222222 -0.3333333
4           0  0.7777778 -0.3333333
5           0 -0.2222222  0.6666667
6           0 -0.2222222  0.6666667
7           0 -0.2222222 -0.3333333
8           0 -0.2222222 -0.3333333
9           0 -0.2222222 -0.3333333

然后使用这个居中的模型矩阵完成逐项计算：

t(beta*t(X1))

  (Intercept) x == 1TRUE x == 2TRUE
1           0 -0.8544762  0.1351550
2           0  0.2441361 -0.2703101
3           0  0.2441361  0.1351550
4           0 -0.8544762  0.1351550
5           0  0.2441361 -0.2703101
6           0  0.2441361 -0.2703101
7           0  0.2441361  0.1351550
8           0  0.2441361  0.1351550
9           0  0.2441361  0.1351550

在居中之后，不同的术语垂直移位以具有零均值。因此，截距将为0.不用担心，通过汇总所有模型项的变化来计算新的截距：

intercept <- as.numeric(crossprod(avx, beta))
# [1] 0.7193212

现在你应该看到predict.glm(, type = "terms")给你的东西了。

predict.glm（，type =“terms”）实际上做了什么？

1 个答案: