插入符:train()函数 - train.default中的错误(x,y,weights = w,...):停止

时间:2017-04-05 07:05:22

标签: r machine-learning svm random-forest gbm

我试图用R中的插入符号预测二进制变量(支出水平)。

我的数据集有415 000行和30个功能(所有功能都是因子)。我需要比较几种机器学习算法的性能。

#Convert factor level   
levels(tableRFM_train$niveau_Depense) <- c("A","B")
#Sample rows and select sub-sample
tableRFM_train <-tableRFM_train[sample(1:nrow(tableRFM_train),size=45000),]

随机森林

当我尝试用子样本的大小调整mtry参数时> 45000行我有这个错误:(如果大小&lt;:我没有它)

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1    NA's   :1   
Error in train.default(x, y, weights = w, ...) : Stopping
De plus : Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.

我不明白......我尝试了几件事来解决问题:

  • 删除缺少值的变量(缺失值按因子级别分组):sum(is.na(tableRFM_train))= 0
  • 删除不平衡变量和变量为零的变量

paramgrid <- data.frame(mtry=seq(1,3,by=1)) cl <- makeCluster(6) registerDoParallel(cl) rfopt <-train(niveau_Depense~.,data=tableRFM_train,method="rf", trControl=trainControl(method="cv",number=10,search="grid",classProbs=T, tuneGrid=paramgrid,prox=TRUE,allowParallel=TRUE)#,na.remove=T stopCluster(cl)

当我测试用200 000行的子样本调整 GBM 时,我对NA&#39; s有相同的错误:1 NA&#39; s:1表示准确度和Kappa。当我用子样本&gt;调整 SVM 时45 000我也有这个错误。

我在目标和其他变量中没有任何缺失值。

str(tableRFM_train)

的结果
'data.frame':   276664 obs. of  28 variables:
 $ q_pm_p_1           : Factor w/ 4 levels "[ 1 - 21 ]","[ 22 - 32 ]",..: 1 4 3 3 2 3 1 1 4 1 ...
 $ q_pm_p_2           : Factor w/ 4 levels "[ 1 - 23 ]","[ 24 - 40 ]",..: 2 4 1 2 1 3 1 4 1 1 ...
 $ q_pm_p_3           : Factor w/ 4 levels "[ 1 - 25 ]","[ 26 - 43 ]",..: 3 4 4 2 4 3 3 4 2 2 ...
 $ q_ir_p_1           : Factor w/ 4 levels "[ 6 - 10 ]","[ 11 - 16 ]",..: 2 1 2 1 4 1 4 1 2 4 ...
 $ q_ir_p_2           : Factor w/ 4 levels "[ 0 - 6 ]","[ 7 - 14 ]",..: 2 1 2 2 4 4 2 1 2 2 ...
 $ q_ir_p_3           : Factor w/ 3 levels "[ 0 - 7 ]","[ 8 - 16 ]",..: 2 1 1 2 1 2 3 1 3 1 ...
 $ q_evol_pm_p_3_p_2  : Factor w/ 4 levels "[ -100 - -40 ]",..: 1 3 3 2 3 1 1 3 1 1 ...
 $ q_evol_pm_p_2_p_1  : Factor w/ 4 levels "[ -100 - -25 ]",..: 1 3 4 3 3 1 2 3 4 2 ...
 $ q_ecart_ir_p_3_p_2 : Factor w/ 4 levels "[ -20 - -2 ]",..: 1 2 4 2 4 4 1 2 1 4 ...
 $ q_ecart_ir_p_2_p_1 : Factor w/ 4 levels "[ -14 - -1 ]",..: 3 4 3 2 2 1 3 3 3 3 ...
 $ q_age              : Factor w/ 12 levels "[18-24]","[25-29]",..: 10 3 3 12 12 8 3 9 12 10 ...
 $ q_anciennete       : Factor w/ 3 levels "[ 0 - 4 ]","[ 5 - 5 ]",..: 2 1 2 3 2 2 2 1 2 2 ...
 $ q_attachement_mag  : Factor w/ 2 levels "Faible","Fort": 1 1 1 1 1 1 1 1 1 2 ...
 $ q_diversification  : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 6 ]",..: 4 4 2 4 2 1 4 1 2 4 ...
 $ q_indice_coordo    : Factor w/ 4 levels "[ 0 - 4 ]","[ 5 - 5 ]",..: 3 3 1 1 2 3 2 1 3 3 ...
 $ q_recence_p_1      : Factor w/ 4 levels "[ 0 - 11 ]","[ 12 - 30 ]",..: 4 1 2 3 2 2 1 4 3 1 ...
 $ q_recence_p_2      : Factor w/ 4 levels "[ 0 - 12 ]","[ 13 - 45 ]",..: 2 4 3 2 2 2 3 4 3 3 ...
 $ q_recence_p_3      : Factor w/ 4 levels "[ 0 - 14 ]","[ 15 - 47 ]",..: 2 4 4 2 4 2 2 4 1 3 ...
 $ q_frequence_p_1    : Factor w/ 4 levels "[ 1 - 2 ]","[ 3 - 5 ]",..: 2 1 2 1 3 1 2 1 2 4 ...
 $ q_frequence_p_2    : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 7 ]",..: 1 4 1 1 2 2 2 4 1 2 ...
 $ q_frequence_p_3    : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 7 ]",..: 1 4 4 1 4 1 2 4 2 1 ...
 $ q_presence_p_1     : Factor w/ 4 levels "[ 1 - 2 ]","[ 3 - 4 ]",..: 2 1 2 1 3 1 2 1 2 4 ...
 $ q_presence_p_2     : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 5 ]",..: 1 4 1 1 3 3 2 4 1 2 ...
 $ q_presence_p_3     : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 6 ]",..: 1 4 4 1 4 1 2 4 2 1 ...
 $ q_delai_reachat_p_1: Factor w/ 4 levels "0-13","14-24",..: 3 4 3 4 3 4 3 4 3 2 ...
 $ q_delai_reachat_p_2: Factor w/ 5 levels "0-14","15-26",..: 5 4 5 5 3 2 1 4 5 1 ...
 $ q_delai_reachat_p_3: Factor w/ 5 levels "0-13","14-24",..: 5 4 4 5 4 5 3 4 3 5 ...
 $ niveau_Depense     : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...

R版本3.3.3(2017-03-06) - &#34;另一个独木舟&#34; 版权所有(C)2017 R统计计算基金会 平台:x86_64-pc-linux-gnu(64位)和最新的Caret版本

0 个答案:

没有答案