我试图找到对应于截止阈值0.5的精度值,作为我的模型评估(逻辑回归)的一部分。
我得到numeric(0)
而不是Y值。
y_hat = predict(mdl, newdata=ds_ts, type="response")
pred = prediction(y_hat, ds_ts$popularity)
perfPrc = performance(pred, "prec")
xPrc = perfPrc@x.values[[1]]
# Find the precision value corresponds to a cutoff threshold of 0.5
prc = yPrc[c(0.5000188)] # perfPrc isn't continuous - closest value to 0.5
prc # output is 'numeric(0)' `
答案 0 :(得分:1)
尝试此操作(假设您拥有模型对象mdl
,同时假设您的响应变量popularity
有2个级别1
(正数)和0
) ,通过应用precision
的定义(您可以尝试一些基于kNN
的近似non-parametric
方法来聚合当前附近截止值的精度值,或将曲线拟合为Precision=f(Cutoff)
以查找精度在未知的截止值,但这将是近似的,而是通过定义精度将给你正确的结果):
p <- predict(mdl, newdata=ds_ts, type='response') # compute the prob that the output class label is 1
test_cut_off <- 0.5 # this is the cut off value for which you want to find precision
preds <- ifelse(p > test_cut_off, 1, 0) # find the class labels predicted with the new cut off
prec <- sum((preds == 1) & (ds_ts$popularity == 1)) / sum(preds == 1) # TP / (TP + FP)
<强> [EDITED} 强> 尝试使用随机生成的数据进行以下简单实验(您可以使用自己的数据进行测试)。
set.seed(1234)
ds_ts <- data.frame(x=rnorm(100), popularity=sample(0:1, 100, replace=TRUE))
mdl <- glm(popularity~x, ds_ts, family=binomial())
y_hat = predict(mdl, newdata=ds_ts, type="response")
pred = prediction(y_hat, ds_ts$popularity)
perfPrc = performance(pred, "prec")
xPrc = perfPrc@x.values[[1]]
yPrc = perfPrc@y.values[[1]]
plot(xPrc, yPrc, pch=19)
test_cut_off <- 0.5 # this is the cut off value for which you want to find precision
# Find the precision value corresponds to a cutoff threshold, since it's not there you can't get this way
prc = yPrc[c(test_cut_off)] # perfPrc isn't continuous
prc #
# numeric(0)
# workaround: 1-NN, use the precision at the neasrest cutoff to get an approximate precision, the one you have used should work
nearest_cutoff_index <- which.min(abs(xPrc - test_cut_off))
approx_prec_at_cutoff <- yPrc[nearest_cutoff_index]
approx_prec_at_cutoff
# [1] 0.5294118
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)
红点表示近似精度(如果幸运的话,它可能与实际精度完全相同)。
# use average precision from k-NN
k <- 3 # 3-NN
nearest_cutoff_indices <- sort(abs(xPrc - test_cut_off), index.return=TRUE)$ix[1:k]
approx_prec_at_cutoff <- mean(yPrc[nearest_cutoff_indices])
approx_prec_at_cutoff
# [1] 0.5294881
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)
p <- predict(mdl, newdata=ds_ts, type='response')
preds <- ifelse(p > 0.5000188, 1, 0)
actual_prec_at_cutoff <- sum((preds == 1) & (ds_ts$popularity == 1)) / sum(preds == 1) # TP / (TP + FP)
actual_prec_at_cutoff
# [1] 0.5294118