Question

我在某些数据集上对R进行了回归分析，并尝试预测每个独立变量对数据集中每行的因变量的贡献。

这样的事情：

set.seed(123)                                              
y <- rnorm(10)                                           
m <- data.frame(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10))
regr <- lm(formula=y~v1+v2+v3, data=m)  
summary(regr)
terms <- predict.lm(regr,m, type="terms")

简而言之：运行回归并使用预测函数计算数据集m中v1，v2和v3的项。但我很难理解预测函数的计算方法。我希望它将回归结果的系数乘以可变数据。对于v1来说就像这样：

coefficients(regr)[2]*m$v1

但与预测功能相比，这会产生不同的结果。

自己计算：

0.55293884  0.16253411  0.18103537  0.04999729 -0.25108302  0.80717945  0.22488764 -0.88835486  0.31681455 -0.21356803

预测函数计算：

0.45870070  0.06829597  0.08679724 -0.04424084 -0.34532115  0.71294132  0.13064950 -0.98259299  0.22257641 -0.30780616

预测函数的大小为0.1左右。如果将预测函数中的所有项与常量一起添加，则它不会累加到总预测中（使用type =“response”）。预测函数在这里计算什么，如何告诉它计算我用系数（regr）[2] * m $ v1做什么？

Answer 1

以下所有行都会产生相同的预测：

# our computed predictions
coefficients(regr)[1] + coefficients(regr)[2]*m$v1 +
  coefficients(regr)[3]*m$v2 + coefficients(regr)[4]*m$v3

# prediction using predict function
predict.lm(regr,m)

# prediction using terms matrix, note that we have to add the constant.
terms_predict = predict.lm(regr,m, type="terms")
terms_predict[,1]+terms_predict[,2]+terms_predict[,3]+attr(terms_predict,'constant')

您可以详细了解如何使用type="terms" here。

您自己的计算（coefficients(regr)[2]*m$v1）和预测函数计算（terms_predict[,1]）不同的原因是因为矩阵中的列以均值为中心，因此它们的均值变为零：

# this is equal to terms_predict[,1]
coefficients(regr)[2]*m$v1-mean(coefficients(regr)[2]*m$v1)

# indeed, all columns are centered; i.e. have a mean of 0.
round(sapply(as.data.frame(terms_predict),mean),10)

希望这有帮助。

预测线性回归的个别术语

1 个答案: