如何迭代更改变量值,直到所有预测的概率均大于.5

时间:2019-01-04 23:37:31

标签: r dplyr prediction

我正在尝试编写从变量中减去给定值的代码,直到每一行的预测概率等于或大于.05为止。

train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
                    'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44),
                    'dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))

train$dich <- as.factor(train$dich)

test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
                   'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
            )

model <- glm(dich ~ cost + price,
             data = train, 
             family = "binomial")

pred  <-   predict(model, test, type = "response")

           1            2            3            4 
3.001821e-01 4.442316e-01 4.507495e-04 6.310900e-01 
           5            6            7            8 
5.995459e-01 9.888085e-01 7.114101e-01 1.606681e-06 
           9           10           11           12 
4.096450e-01 2.590474e-02 9.908167e-04 3.572890e-01

因此,在上面的输出中,情况4、5、6和7将保持不变,因为它们已经高于.05,但对于其余情况,我想从价格列中减去1,然后运行再次进行预测并重复,直到所有情况的概率都等于或大于0.05。

3 个答案:

答案 0 :(得分:1)

如果您要为每一行(或“客户”)分别减去1,而不是全盘减1:

test$pred_prob <- NA
for (n in 1:nrow(test)) {
  print("-----------------------------")
  print(n)
  while (TRUE) {
    pred <- predict(model, test[n,], type = "response")
    print(pred)
    test$pred_prob[n] <- pred
    if (sum(pred > 0.05) == length(pred)) { 
      print(test$price[n])
      break 
    }
    test$price[n] <- test$price[n] - 1
  }
print(test)
}

# cost price  pred_prob
# 1    13    32 0.30018209
# 2     5    11 0.44423163
# 3    32    96 0.05128337
# 4    22     6 0.63109001
# 5    14     3 0.59954586
# 6   145     7 0.98880854
# 7    54    22 0.71141007
# 8   134   175 0.05074762
# 9    11    19 0.40964501
# 10   14    82 0.05149897
# 11   33    97 0.05081947
# 12   21    32 0.35728897

答案 1 :(得分:0)

我知道您正在尝试做什么,但是结果非常有趣。如果您想每次从价格的所有元素中减去1:

x <- 1
while (TRUE) {
  print("----------------------------------------")
  print(x)
  test$price <- test$price - 1
  pred <- predict(model, test, type = "response")
  print(pred)
  x <- x + 1
  if (sum(pred > 0.05) == length(pred)) { 
    print(test)
    break 
  }
}
# ... loops 247 times
# [1] "----------------------------------------"
# [1] 248
# 1          2          3          4          5          6          7          8          9         10         11         12 
# 0.99992994 0.99996240 0.93751936 0.99998243 0.99997993 0.99999966 0.99998781 0.05074762 0.99995669 0.99887117 0.97058913 0.99994594 
# cost price
# 1    13  -216
# 2     5  -237
# 3    32   -38
# 4    22  -242
# 5    14  -245
# 6   145  -241
# 7    54  -226
# 8   134   175
# 9    11  -229
# 10   14  -149
# 11   33   -56
# 12   21  -216

答案 2 :(得分:0)

万一其他人想用xgboost模型运行相同的东西。

train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
                    'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44))

label <- data.frame('dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))

train <- as.matrix(train)

label <- as.matrix(label)

model <- xgboost(data = train,
                 label = label,
                 max.depth = 3, 
                 nround = 1, 
                 objective = "binary:logistic") 

test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
                   'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)

test <- as.matrix(test)

#FOR A MATRIX

test <- cbind(test, rep(NA, nrow(test)))
colnames(test)[3] <- c("pred_prob")

for (n in 1:nrow(test)) {
  print("-----------------------------")
  print(n)
  while (TRUE) {
    pred <- predict(model, t(test[n,]), type = "response")
    print(pred)
    test[,"pred_prob"][n] <- pred
    if (sum(pred > 0.5) == length(pred)) { 
      print(test[,"pred_prob"][n])
      break 
    }
    test[,"price"][n] <- test[,"price"][n] - .01
  }
  print(test)
}

运行12行似乎需要一段时间。我需要对树模型的阈值进行一些思考,以及如何影响价格的一系列不同变化,以获得等于或大于.5的概率(这是我在第一个问题中的意思,但我写了.05 haha​​) 。