如何将具有三个级别的类转换为二进制“0”,并将1个用于在葡萄酒数据集的插入符号中使用glm

时间:2016-11-01 09:24:33

标签: r glm r-caret

我想在使用10倍交叉验证时对葡萄酒数据集执行线性回归。但是我的班级有3个级别'1''2''3'

这是到目前为止的代码。

require(boot, quietly = TRUE)
require(caret)

wine_data<-"wine.data"    # has the wine data from https://archive.ics.uci.edu/ml/machine-learning-databases/wine/    


colnames(wine_data)<-c("Class","Alcohol","Malic acid", "Ash", "Alcalinity of ash", "Magnesium","Total phenols", "Flavanoids", "Nonflavanoid phenols","Proanthocyanins","Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline")      

wine_data_lr<- wine_data

wine_data_lr$Class<-as.numeric(wine_data_lr$Class)
wine_data_lr$Magnesium<-as.numeric(wine_data_lr$Magnesium)
wine_data_lr$Proline<-as.numeric(wine_data_lr$Proline)

str(wine_data_lr)

'data.frame':   178 obs. of  14 variables:
 $ Class                       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Alcohol                     : num  14.2 13.2 13.2 14.4 13.2 ...
 $ Malic acid                  : num  1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
 $ Ash                         : num  2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
 $ Alcalinity of ash           : num  15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
 $ Magnesium                   : num  127 100 101 113 118 112 96 121 97 98 ...
 $ Total phenols               : num  2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
 $ Flavanoids                  : num  3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
 $ Nonflavanoid phenols        : num  0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
 $ Proanthocyanins             : num  2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
 $ Color intensity             : num  5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
 $ Hue                         : num  1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
 $ OD280/OD315 of diluted wines: num  3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
 $ Proline                     : num  1065 1050 1185 1480 735 ...

ctrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)

lr_mod_fit <- train(Class ~ .,  data=wine_data_lr, method="glm", family="binomial",trControl = ctrl, tuneLength = 5)

      RMSE        Rsquared  
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 11 warnings (use warnings() to see them)

warnings()

envir, enclos) :
  model fit failed for Fold02: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

3: In eval(expr, envir, enclos) :
  model fit failed for Fold03: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

4: In eval(expr, envir, enclos) :
  model fit failed for Fold04: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

5: In eval(expr, envir, enclos) :
  model fit failed for Fold05: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

6: In eval(expr, envir, enclos) :
  model fit failed for Fold06: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

7: In eval(expr, envir, enclos) :
  model fit failed for Fold07: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

8: In eval(expr, envir, enclos) :
  model fit failed for Fold08: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

9: In eval(expr, envir, enclos) :
  model fit failed for Fold09: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

10: In eval(expr, envir, enclos) :
  model fit failed for Fold10: parameter=none Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

11: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

从错误/警告消息中,我发现类级别必须是0或1.

作为预测者,'等级'通常是一个因素,我做了     wine_data_lr $类&LT; -as.factor(wine_data_lr $类) 并重新运行相同的代码,但得到了同样的错误。

因为我提到的是family =“binomial”,所以可能只有两个可能的级别,但我的数据有三个级别,这可能导致错误。所以我做了family =“multinomial” ,但我仍然得到完全相同的错误。 我该如何解决这个问题?有没有办法将三个级别转换为两个二进制级别0 1.

到目前为止,我用Google搜索并抬头看了看 https://github.com/topepo/caret/issues/160
Train function from R caret package error: "Something is wrong; all the Accuracy metric values are missing"
R: Something is wrong; all the Accuracy metric values are missing
getting this error in Caret
"Something is wrong; all the Accuracy metric values are missing" Error in Caret Training
"Something is wrong; all the Accuracy metric values are missing:"

但是我不太明白如何解决我的问题。

感谢任何帮助1!

0 个答案:

没有答案