L2使用插入符号正则化MLR,以及如何确保我在预测时使用最佳模型

时间:2016-05-20 01:31:27

标签: r linear-regression r-caret glmnet

我正在尝试使用插入符对数据集进行L2正则化MLR。以下是我到目前为止所做的工作:

r_squared <-  function ( pred, actual){
    mean_actual = mean (actual)
    ss_e = sum ((pred - actual )^2)
    ss_total = sum ((actual-mean_actual)^2 )
    r_squared = 1 - (ss_e/ss_total)
}

df = as.data.frame(matrix(rnorm(10000, 10, 3), 1000))
colnames(df)[1] = "response"
set.seed(753)
inTraining <- createDataPartition(df[["response"]], p = .75, list = FALSE)
training <- df[inTraining,]
testing  <- df[-inTraining,]
testing_response <- base::subset(testing,
                                 select = c(paste ("response")))
gridsearch_for_lambda =  data.frame (alpha = 0, 
                                      lambda = c (2^c(-15:15), 3^c(-15:15)))
regression_formula = as.formula (paste ("response", "~ ", " .", sep = " "))
train_control = trainControl (method="cv", number =10,
                              savePredictions =TRUE , allowParallel = FALSE )
model = train (regression_formula,
                           data = training,
                           trControl = train_control,
                           method = "glmnet",
                           tuneGrid =gridsearch_for_lambda,
                           preProcess = NULL
            )
prediction = predict (model, newdata = testing)
testing_response[["predicted"]] = prediction
r_sq = round (r_squared(testing_response[["predicted"]],
              testing_response[["response"]] ),3)

这里我担心我用于预测的模型是最好的(最佳调谐lambda值)。

P.S。:数据是从随机正态分布中采样的,它没有给出良好的R ^ 2值,但我想正确地得到这个想法

0 个答案:

没有答案