使用Caret train()函数,我的游侠模型(pred)的输出对象是字符而不是数字

时间:2018-09-26 21:09:11

标签: r data-science r-caret

我仅在使用某些数据集时遇到此问题。当我使用以下输入数据时,结果看起来很好

str(trainDataFrame, list.len = ncol(trainDataFrame))
'data.frame':   486 obs. of  173 variables:
 $ snaive                      : int  
 $ arima                       : num  
 $ ets                         : num  
 $ stl                         : num  
 $ tsAverage                   : num  
 $ horizon                     : Factor w/ 12 levels
 $ OpenLag1                    : num  
 $ OpenLag2                    : num  
 $ OpenLag3                    : num  
 $ CloseLag1                   : num  
 $ CloseLag2                   : num  
 $ CloseLag3                   : num  
 $ US.HR.RecruitingLag1        : int  
 $ US.HR.RecruitingLag2        : int  
 $ US.HR.RecruitingLag3        : int  
 $ US.Employment.rateLag1      : num  
 $ US.Employment.rateLag2      : num  
 $ US.Employment.rateLag3      : num  
 $ Services.Person.HireLag1    : int  
 $ Services.Person.HireLag2    : int  
 $ Services.Person.HireLag3    : int  
 $ target                      : num  
 $ trend                       : int  
 $ season                      : Factor w/ 13 levels 
 $ numericIndex                : num  
 $ arima_ets                   : num  
 $ arima_stl                   : num  
 $ ets_stl                     : num  
 $ arima_ets_stl               : num  
 $ arima_ets_snaive            : num  
 $ arima_stl_snaive            : num  
 $ ets_stl_snaive              : num  

但是,当我使用以下输入数据时,会收到chr作为输出预测

str(trainDataFrame)
'data.frame':   234 obs. of  46 variables:
 $ snaive               : num  
 $ arima                : num  
 $ ets                  : num  
 $ tsAverage            : num  
 $ horizon              : Factor w/ 12 levels 
 $ HiPoLag1             : num  
 $ HiPoLag2             : num  
 $ HiPoLag3             : num  
 $ Calendar.DaysLag1    : int  
 $ Calendar.DaysLag2    : int  
 $ Calendar.DaysLag3    : int  
 $ Consumption.DaysLag1 : int  
 $ Consumption.DaysLag2 : int  
 $ Consumption.DaysLag3 : int  
 $ target               : num  
 $ trend                : int  
 $ season               : Factor w/ 13 levels 
 $ numericIndex         : num 
 $ arima_ets            : num

这是来自第二输入数据的结果。请注意,ranger $ pred $ pred是char而不是num。

$ ranger   :List of 23

..$ method      : chr "ranger"

..$ modelInfo   :List of 15 ...

..$ modelType   : chr "Regression"

..$ results     :'data.frame':  5 obs. of  5 variables: ...

..$ pred        :'data.frame':  720 obs. of  5 variables:

.. ..$ pred    : chr [1:720] 

.. ..$ obs     : num [1:720] 

.. ..$ rowIndex: int [1:720] 102 102 102 102 102 101 114 101 114 101 ...

.. ..$ mtry    : num [1:720] 3 14 26 37 49 3 3 14 14 26 ...

.. ..$ Resample: chr [1:720] "Training02" "Training02" "Training02" 

如果需要查看代码,以下是我用来调用两个数据集的训练函数的代码

trControl = list(verboseIter = TRUE)    
trControl <- c(list(index = cvindexes[["cvtrainidx"]],
                          indexOut = cvindexes[["cvtestidx"]],
                          savePredictions = "all"),
                     trControl)
caretTrainControl <- do.call(caret::trainControl, trControl)
trainedModels <- lapply(
          mlParams,
          function(x) do.call(caret::train, c(list(form = target ~ .,
                                                   data = trainDataFrame,
                                                   trControl = caretTrainControl),
                                              x))
        )

两种情况下均使用相同的mlParam。请看下面。

$knn
$knn$method
[1] "knn"

$knn$tuneGrid
    k
1   1
2   2
3   3
...
20 20

$knn$metric
[1] "RMSE"

$knn$preProcess
[1] "zv"        "knnImpute" "center"    "scale"    


$glmnet
$glmnet$method
[1] "glmnet"

$glmnet$tuneLength
[1] 50

$glmnet$metric
[1] "RMSE"

$glmnet$preProcess
[1] "zv"        "knnImpute" "center"    "scale"    


$svmRadial
$svmRadial$method
[1] "svmRadial"

$svmRadial$tuneGrid
      C sigma
1    10 1e-05
2   100 1e-05
3  1000 1e-05
...
12 1000 1e-02

$svmRadial$metric
[1] "RMSE"

$svmRadial$preProcess
[1] "zv"        "knnImpute" "center"    "scale"    


$xgbTree
$xgbTree$method
[1] "xgbTree"

$xgbTree$tuneGrid
    nrounds max_depth   eta gamma colsample_bytree min_child_weight
1         1         2 0.005     0              0.3                1
2         2         2 0.005     0              0.3                1
3         3         2 0.005     0              0.3                1
...
900     100         6 0.005     0              0.7                1

$xgbTree$nthread
[1] 1

$xgbTree$metric
[1] "RMSE"

$xgbTree$preProcess
[1] "zv"        "knnImpute" "center"    "scale" 

我不明白为什么对于第二个数据集,Ranger $ pred $ pred导致chr而不是num。有没有人经历过或者知道发生了什么?预先谢谢您!

0 个答案:

没有答案