我试图用R中的插入符号预测二进制变量(支出水平)。
我的数据集有415 000行和30个功能(所有功能都是因子)。我需要比较几种机器学习算法的性能。
#Convert factor level
levels(tableRFM_train$niveau_Depense) <- c("A","B")
#Sample rows and select sub-sample
tableRFM_train <-tableRFM_train[sample(1:nrow(tableRFM_train),size=45000),]
随机森林
当我尝试用子样本的大小调整mtry参数时> 45000行我有这个错误:(如果大小&lt;:我没有它)
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error in train.default(x, y, weights = w, ...) : Stopping
De plus : Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
我不明白......我尝试了几件事来解决问题:
paramgrid <- data.frame(mtry=seq(1,3,by=1))
cl <- makeCluster(6)
registerDoParallel(cl)
rfopt <-train(niveau_Depense~.,data=tableRFM_train,method="rf",
trControl=trainControl(method="cv",number=10,search="grid",classProbs=T,
tuneGrid=paramgrid,prox=TRUE,allowParallel=TRUE)#,na.remove=T
stopCluster(cl)
当我测试用200 000行的子样本调整 GBM 时,我对NA&#39; s有相同的错误:1 NA&#39; s:1表示准确度和Kappa。当我用子样本&gt;调整 SVM 时45 000我也有这个错误。
我在目标和其他变量中没有任何缺失值。
str(tableRFM_train):
的结果'data.frame': 276664 obs. of 28 variables:
$ q_pm_p_1 : Factor w/ 4 levels "[ 1 - 21 ]","[ 22 - 32 ]",..: 1 4 3 3 2 3 1 1 4 1 ...
$ q_pm_p_2 : Factor w/ 4 levels "[ 1 - 23 ]","[ 24 - 40 ]",..: 2 4 1 2 1 3 1 4 1 1 ...
$ q_pm_p_3 : Factor w/ 4 levels "[ 1 - 25 ]","[ 26 - 43 ]",..: 3 4 4 2 4 3 3 4 2 2 ...
$ q_ir_p_1 : Factor w/ 4 levels "[ 6 - 10 ]","[ 11 - 16 ]",..: 2 1 2 1 4 1 4 1 2 4 ...
$ q_ir_p_2 : Factor w/ 4 levels "[ 0 - 6 ]","[ 7 - 14 ]",..: 2 1 2 2 4 4 2 1 2 2 ...
$ q_ir_p_3 : Factor w/ 3 levels "[ 0 - 7 ]","[ 8 - 16 ]",..: 2 1 1 2 1 2 3 1 3 1 ...
$ q_evol_pm_p_3_p_2 : Factor w/ 4 levels "[ -100 - -40 ]",..: 1 3 3 2 3 1 1 3 1 1 ...
$ q_evol_pm_p_2_p_1 : Factor w/ 4 levels "[ -100 - -25 ]",..: 1 3 4 3 3 1 2 3 4 2 ...
$ q_ecart_ir_p_3_p_2 : Factor w/ 4 levels "[ -20 - -2 ]",..: 1 2 4 2 4 4 1 2 1 4 ...
$ q_ecart_ir_p_2_p_1 : Factor w/ 4 levels "[ -14 - -1 ]",..: 3 4 3 2 2 1 3 3 3 3 ...
$ q_age : Factor w/ 12 levels "[18-24]","[25-29]",..: 10 3 3 12 12 8 3 9 12 10 ...
$ q_anciennete : Factor w/ 3 levels "[ 0 - 4 ]","[ 5 - 5 ]",..: 2 1 2 3 2 2 2 1 2 2 ...
$ q_attachement_mag : Factor w/ 2 levels "Faible","Fort": 1 1 1 1 1 1 1 1 1 2 ...
$ q_diversification : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 6 ]",..: 4 4 2 4 2 1 4 1 2 4 ...
$ q_indice_coordo : Factor w/ 4 levels "[ 0 - 4 ]","[ 5 - 5 ]",..: 3 3 1 1 2 3 2 1 3 3 ...
$ q_recence_p_1 : Factor w/ 4 levels "[ 0 - 11 ]","[ 12 - 30 ]",..: 4 1 2 3 2 2 1 4 3 1 ...
$ q_recence_p_2 : Factor w/ 4 levels "[ 0 - 12 ]","[ 13 - 45 ]",..: 2 4 3 2 2 2 3 4 3 3 ...
$ q_recence_p_3 : Factor w/ 4 levels "[ 0 - 14 ]","[ 15 - 47 ]",..: 2 4 4 2 4 2 2 4 1 3 ...
$ q_frequence_p_1 : Factor w/ 4 levels "[ 1 - 2 ]","[ 3 - 5 ]",..: 2 1 2 1 3 1 2 1 2 4 ...
$ q_frequence_p_2 : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 7 ]",..: 1 4 1 1 2 2 2 4 1 2 ...
$ q_frequence_p_3 : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 7 ]",..: 1 4 4 1 4 1 2 4 2 1 ...
$ q_presence_p_1 : Factor w/ 4 levels "[ 1 - 2 ]","[ 3 - 4 ]",..: 2 1 2 1 3 1 2 1 2 4 ...
$ q_presence_p_2 : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 5 ]",..: 1 4 1 1 3 3 2 4 1 2 ...
$ q_presence_p_3 : Factor w/ 4 levels "[ 1 - 3 ]","[ 4 - 6 ]",..: 1 4 4 1 4 1 2 4 2 1 ...
$ q_delai_reachat_p_1: Factor w/ 4 levels "0-13","14-24",..: 3 4 3 4 3 4 3 4 3 2 ...
$ q_delai_reachat_p_2: Factor w/ 5 levels "0-14","15-26",..: 5 4 5 5 3 2 1 4 5 1 ...
$ q_delai_reachat_p_3: Factor w/ 5 levels "0-13","14-24",..: 5 4 4 5 4 5 3 4 3 5 ...
$ niveau_Depense : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
R版本3.3.3(2017-03-06) - &#34;另一个独木舟&#34; 版权所有(C)2017 R统计计算基金会 平台:x86_64-pc-linux-gnu(64位)和最新的Caret版本