Question

我必须使用名为psub的子集

创建一个包含线性回归的好模型

我创建了一个测试人口和一个培训人口：

nobs <- nrow(psub)
set.seed(1000)
train_indices <- sample(1:nobs, 0.7*nobs, replace=F)
test_indices <- setdiff(1: nobs, train_indices)
a <- psub.train <- psub[train_indices,]
psub.train <- psub[train_indices,]
psub.test <- psub[test_indices,]
psub.train <- psub%>%sample_frac(0.70, replace = FALSE)
psub.test <- setdiff(psub, psub.train)

我创建了一个模型：

psub.model = lm(PINCP ~ SEX*AGEP*COW*SCHL, data = psub.train)

现在，我想知道哪个预测变量或哪个预测变量组合最不重要，而不必查看摘要的每个p值（psub.model）

我怎样才能做到这一点？

Answer 1

This is not a good way of doing model selection。但是如果你想这样做，听起来你正在寻找的是stepwise regression，特别是向后消除。许多教科书都涵盖了逐步选择，例如this one。

代码示例：

#predict iris petal length from the other variables
#begin by fitting full model
full_model = lm(Petal.Length ~ Petal.Width + Sepal.Length + Sepal.Width + Species, data = iris)

#backwards elimination
step(full_model, direction = "backward")

根据AIC返回最佳拟合模型，在这种情况下，这是完整模型。

Answer 2

找到p值向量的最大值（对应于最不重要的预测值）应该是这样的......

cc <- coef(summary(psub.model))  ## coefficient table
which.max(cc[,"Pr(>|t|)"])

用R

2 个答案: