如何预测R的插入符号包中测试数据集的概率?

时间:2015-06-19 04:36:02

标签: r random-forest r-caret

以下是我的示例数据集:

# TEMP DATA
train_predictors <- matrix(data = c(1,2,
                                    1,3,
                                    2,4,
                                    3,5,
                                    4,6,
                                    5,4,
                                    6,5,
                                    6,6,
                                    7,7,
                                    8,8), nrow = 10, ncol = 2)

train_labels <- c(1,1,1,1,1,0,0,0,0,0)
test_predictors <- matrix(data = c(1,2), nrow = 1, ncol = 2)

# PREPROCESSING OF DATA
train_predictors <- as.data.frame(train_predictors)
test_predictors <- as.data.frame(test_predictors)
train_labels <- as.factor(train_labels)

这就是如何在train_predictorstrain_labels上训练一个简单的随机森林。

# APPLY SIMPLE RANDOM FOREST ON TRAIN DATA
my_train_control <- trainControl(method = "cv", 
                                 number = 2, 
                                 savePredictions = TRUE, 
                                 classProbs = TRUE)

rf_model <- train(x = train_predictors, 
                  y = train_labels, 
                  trControl = my_train_control, 
                  tuneLength = 1)

您将收到以下警告:

Warning message:
In train.default(x = train_predictors, y = train_labels, trControl = my_train_control,  :
At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1

但这只是因为0,1被用作类标签(因此在预测'数据帧中创建列时,它创建的列为X0和X1而不是0和1) - 正如 Max Kuhn所解释的那样(topepo)

我能够在测试数据点上提取类预测,如下所示:

prediction_class_on_test_data <- predict(rf_model, test_predictors)
prediction_class_on_test_data <- as.numeric(as.character(prediction_class_on_test_data))

但是当我尝试按如下方式预测测试数据点的概率时:

prediction_prob_on_test_data <- predict(rf_model, test_predictors, type = "prob")
prediction_prob_on_test_data <- as.numeric(as.character(prediction_prob_on_test_data))

我收到以下错误:

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
    undefined columns selected

我确信某处有一个简单的错误,但我做错了什么?

更新

我能够使用extractProb函数获取测试数据集的类概率和预测,如下所示:

dummy_test_labels <- rep(0, nrow(test_predictors))
predictions_on_complete_data <- extractProb(models = list(rf_model), testX = test_predictors, testY = dummy_test_labels)
predictions_on_test_data <- predictions_on_complete_data[predictions_on_complete_data$dataType == "Test", ]

但仍不确定为什么predict()无法使用type="prob"

0 个答案:

没有答案