使用“ ElemStatLearn”包中的“前列腺”数据集。
set.seed(3434)
fit.lm = train(data=trainset, lpsa~., method = "lm")
fit.ridge = train(data=trainset, lpsa~., method = "ridge")
fit.lasso = train(data=trainset, lpsa~., method = "lasso")
RMSE的比较(针对脊和套索情况下的bestTune)
fit.lm$results[,"RMSE"]
[1] 0.7895572
fit.ridge$results[fit.ridge$results[,"lambda"]==fit.ridge$bestTune$lambda,"RMSE"]
[1] 0.8231873
fit.lasso$results[fit.lasso$results[,"fraction"]==fit.lasso$bestTune$fraction,"RMSE"]
[1] 0.7779534
比较系数的绝对值
abs(round(fit.lm$finalModel$coefficients,2))
(Intercept) lcavol lweight age lbph svi lcp gleason pgg45
0.43 0.58 0.61 0.02 0.14 0.74 0.21 0.03 0.01
abs(round(predict(fit.ridge$finalModel, type = "coef", mode = "norm")$coefficients[8,],2))
lcavol lweight age lbph svi lcp gleason pgg45
0.49 0.62 0.01 0.14 0.65 0.05 0.00 0.01
abs(round(predict(fit.lasso$finalModel, type = "coef", mode = "norm")$coefficients[8,],2))
lcavol lweight age lbph svi lcp gleason pgg45
0.56 0.61 0.02 0.14 0.72 0.18 0.00 0.01
我的问题是:“脊” RMSE如何比普通“ lm”更高。那不是违背惩罚回归与单纯的“ lm”的目的吗?
此外,“ lweight”系数的绝对值实际上如何在ridge(0.62)中高于lm(0.61)?没有abs()时,两个系数本来都是正的。
我期望ridge的性能与套索相似,这不仅降低了RMSE,而且相对于普通“ lm”,系数的大小也缩小了。
谢谢!