插入符号:使用train()执行分组回归

时间:2016-09-28 13:10:04

标签: r machine-learning regression r-caret

希望这不是一个完全愚蠢的问题。我有一个数据集df, n = 2228, p = 19,用于描述5品种马的特征。我想将连续变量price建模为每个breed的其他17个预测变量(甚至是分类和连续的混合)的函数,首先将数据拆分为training和{ {1}}。

test

据我所知,我没有问题将数据拆分为library(tidyverse) library(caret) library(glmnet) # pre- processing reveals no undo correlation, linear dependency or near # zero variance veriables train <- df %>% group_by(breed) %>% sample_frac(size = 2/3) %>% droplevels() test <- anti_join(df, train) %>% droplevels() # I imagine I should be somehow able to do this in the following step but can't # figure it out model <- train(price ~ ., data = train, method = "glmnet") test$pred <- predict(model, newdata = test) (参见上面的代码)。但是,我无法弄清楚如何拟合按breed分组的模型。我想做的是类似于breed包中的以下内容,即nlme

3 个答案:

答案 0 :(得分:1)

我认为你想做的事情就像是

horse_typex <- df %>% filter(breed == typex)

对于每种类型的马,然后将它们分成测试和训练集。

如果您希望进行线性回归,也许您可​​能希望对价格的对数进行建模,因为它可能会出现偏差。

答案 1 :(得分:0)

尝试:

models <- dlply(df, "breed", function(d_breed) 
  train(price ~ ., data = d_breed, method = "glmnet"))

答案 2 :(得分:0)

我建议您尝试使用purrr

library(purrr)

models <- train %>% 
            split(.$breed) %>% 
            map(~train(.$price ~ ., data = ., method = "glmnet")) 

dplyr

models <- train %>% 
            group_by(breed) %>% 
            do(train(price ~ ., data = ., method = "glmnet")) 

很难知道这是否有效,但值得一试。