使用bartMachine R包的预测概率是失败概率

时间:2016-10-26 22:23:28

标签: r classification bayesian

如果我使用bartMachine运行BART模型进行分类,则返回的p_hat_train值对应于失败概率,而不是BayesTree R中BART初始实现中的成功概率包。

以下是模拟二进制响应的示例:

library(bartMachine)
library(BayesTree)
library(logitnorm)

N = 1000
X <- rnorm(N, 0, 1)
p_true <- invlogit(1.5*X)
y <- rbinom(N, 1, p_true)

## bartMachine
fit <- bartMachine(data.frame(X), as.factor(y), num_burn_in = 200,
                   num_iterations_after_burn_in = 500)
p_hat <- fit$p_hat_train

## BayesTree
fit2 <- bart(X, as.factor(y), ntree = 50, ndpost = 500)
p_hat2 <- apply(pnorm(fit2$yhat.train), 2, mean)

par(mfrow = c(2,2))
plot(p_hat, p_true, main = 'p_hat_train with bartMachine')
abline(0, 1, col = 'red')
plot(1 - p_hat, p_true, main = '1 - p_hat_train with bartMachine')
abline(0, 1, col = 'red')
plot(p_hat2, p_true, main = 'pnorm(yhat.train) with BayesTree')
abline(0, 1, col = 'red')

Plot of predicted probabilities with bartMachine and BayesTree

1 个答案:

答案 0 :(得分:2)

iris检查?bartMachine示例表明bartMachine正在估算观察被归类为y变量的第一级的概率,在您的示例中恰好为0.要获得所需的结果,在将y转换为因子时,您需要指定级别,即

fit <- bartMachine(data.frame(X), factor(y, levels = c("1", "0")), 
  num_burn_in = 200,
  num_iterations_after_burn_in = 500)

我们可以看到在检查build_bart_machine的代码时会发生什么:

if (class(y) == "factor" & length(y_levels) == 2) {
        java_bart_machine = .jnew("bartMachine.bartMachineClassificationMultThread")
        y_remaining = ifelse(y == y_levels[1], 1, 0)
        pred_type = "classification"
    }

查看bartMachine的输出(使用您的原始规范)会显示正在进行的操作:

head(cbind(fit$model_matrix_training_data, y))
#             X y_remaining y
# 1 -0.85093975           0 1
# 2  0.20955263           1 0
# 3  0.66489564           0 1
# 4 -0.09574123           1 0
# 5 -1.22480134           1 0
# 6 -0.36176273           1 0