我正在尝试编写从变量中减去给定值的代码,直到每一行的预测概率等于或大于.05为止。
train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44),
'dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))
train$dich <- as.factor(train$dich)
test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)
model <- glm(dich ~ cost + price,
data = train,
family = "binomial")
pred <- predict(model, test, type = "response")
1 2 3 4
3.001821e-01 4.442316e-01 4.507495e-04 6.310900e-01
5 6 7 8
5.995459e-01 9.888085e-01 7.114101e-01 1.606681e-06
9 10 11 12
4.096450e-01 2.590474e-02 9.908167e-04 3.572890e-01
因此,在上面的输出中,情况4、5、6和7将保持不变,因为它们已经高于.05,但对于其余情况,我想从价格列中减去1,然后运行再次进行预测并重复,直到所有情况的概率都等于或大于0.05。
答案 0 :(得分:1)
如果您要为每一行(或“客户”)分别减去1,而不是全盘减1:
test$pred_prob <- NA
for (n in 1:nrow(test)) {
print("-----------------------------")
print(n)
while (TRUE) {
pred <- predict(model, test[n,], type = "response")
print(pred)
test$pred_prob[n] <- pred
if (sum(pred > 0.05) == length(pred)) {
print(test$price[n])
break
}
test$price[n] <- test$price[n] - 1
}
print(test)
}
# cost price pred_prob
# 1 13 32 0.30018209
# 2 5 11 0.44423163
# 3 32 96 0.05128337
# 4 22 6 0.63109001
# 5 14 3 0.59954586
# 6 145 7 0.98880854
# 7 54 22 0.71141007
# 8 134 175 0.05074762
# 9 11 19 0.40964501
# 10 14 82 0.05149897
# 11 33 97 0.05081947
# 12 21 32 0.35728897
答案 1 :(得分:0)
我知道您正在尝试做什么,但是结果非常有趣。如果您想每次从价格的所有元素中减去1:
x <- 1
while (TRUE) {
print("----------------------------------------")
print(x)
test$price <- test$price - 1
pred <- predict(model, test, type = "response")
print(pred)
x <- x + 1
if (sum(pred > 0.05) == length(pred)) {
print(test)
break
}
}
# ... loops 247 times
# [1] "----------------------------------------"
# [1] 248
# 1 2 3 4 5 6 7 8 9 10 11 12
# 0.99992994 0.99996240 0.93751936 0.99998243 0.99997993 0.99999966 0.99998781 0.05074762 0.99995669 0.99887117 0.97058913 0.99994594
# cost price
# 1 13 -216
# 2 5 -237
# 3 32 -38
# 4 22 -242
# 5 14 -245
# 6 145 -241
# 7 54 -226
# 8 134 175
# 9 11 -229
# 10 14 -149
# 11 33 -56
# 12 21 -216
答案 2 :(得分:0)
万一其他人想用xgboost模型运行相同的东西。
train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44))
label <- data.frame('dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))
train <- as.matrix(train)
label <- as.matrix(label)
model <- xgboost(data = train,
label = label,
max.depth = 3,
nround = 1,
objective = "binary:logistic")
test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)
test <- as.matrix(test)
#FOR A MATRIX
test <- cbind(test, rep(NA, nrow(test)))
colnames(test)[3] <- c("pred_prob")
for (n in 1:nrow(test)) {
print("-----------------------------")
print(n)
while (TRUE) {
pred <- predict(model, t(test[n,]), type = "response")
print(pred)
test[,"pred_prob"][n] <- pred
if (sum(pred > 0.5) == length(pred)) {
print(test[,"pred_prob"][n])
break
}
test[,"price"][n] <- test[,"price"][n] - .01
}
print(test)
}
运行12行似乎需要一段时间。我需要对树模型的阈值进行一些思考,以及如何影响价格的一系列不同变化,以获得等于或大于.5的概率(这是我在第一个问题中的意思,但我写了.05 haha) 。