查找对应于特定X的Y值

时间:2017-01-24 20:18:39

标签: r

我试图找到对应于截止阈值0.5的精度值,作为我的模型评估(逻辑回归)的一部分。 我得到numeric(0)而不是Y值。

y_hat = predict(mdl, newdata=ds_ts, type="response")

pred  = prediction(y_hat, ds_ts$popularity)  

perfPrc  = performance(pred, "prec")           

xPrc = perfPrc@x.values[[1]]

# Find the precision value corresponds to a cutoff threshold of 0.5 
prc = yPrc[c(0.5000188)] # perfPrc isn't continuous - closest value to 0.5

prc # output is 'numeric(0)' `

Precision vs Cutoff

1 个答案:

答案 0 :(得分:1)

尝试此操作(假设您拥有模型对象mdl,同时假设您的响应变量popularity有2个级别1(正数)和0) ,通过应用precision的定义(您可以尝试一些基于kNN的近似non-parametric方法来聚合当前附近截止值的精度值,或将曲线拟合为Precision=f(Cutoff)以查找精度在未知的截止值,但这将是近似的,而是通过定义精度将给你正确的结果):

p <- predict(mdl, newdata=ds_ts, type='response') # compute the prob that the output class label is 1
test_cut_off <- 0.5 # this is the cut off value for which you want to find precision
preds <- ifelse(p > test_cut_off, 1, 0) # find the class labels predicted with the new cut off
prec <-  sum((preds == 1) & (ds_ts$popularity == 1)) /  sum(preds == 1) # TP / (TP + FP)

<强> [EDITED} 尝试使用随机生成的数据进行以下简单实验(您可以使用自己的数据进行测试)。

set.seed(1234)
ds_ts <- data.frame(x=rnorm(100), popularity=sample(0:1, 100, replace=TRUE))
mdl <- glm(popularity~x, ds_ts, family=binomial())
y_hat = predict(mdl, newdata=ds_ts, type="response")
pred  = prediction(y_hat, ds_ts$popularity)  
perfPrc  = performance(pred, "prec")           
xPrc = perfPrc@x.values[[1]]
yPrc = perfPrc@y.values[[1]]
plot(xPrc, yPrc, pch=19)

enter image description here

test_cut_off <- 0.5 # this is the cut off value for which you want to find precision

# Find the precision value corresponds to a cutoff threshold, since it's not there you can't get this way 
prc = yPrc[c(test_cut_off)] # perfPrc isn't continuous
prc #
# numeric(0)

# workaround: 1-NN, use the precision at the neasrest cutoff to get an approximate precision, the one you have used should work
nearest_cutoff_index <- which.min(abs(xPrc - test_cut_off))
approx_prec_at_cutoff <- yPrc[nearest_cutoff_index]
approx_prec_at_cutoff
# [1] 0.5294118
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)

enter image description here

红点表示近似精度(如果幸运的话,它可能与实际精度完全相同)。

# use average precision from k-NN
k <- 3 # 3-NN
nearest_cutoff_indices <- sort(abs(xPrc - test_cut_off), index.return=TRUE)$ix[1:k]
approx_prec_at_cutoff <- mean(yPrc[nearest_cutoff_indices])
approx_prec_at_cutoff
# [1] 0.5294881
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)

enter image description here

p <- predict(mdl, newdata=ds_ts, type='response')
preds <- ifelse(p > 0.5000188, 1, 0)
actual_prec_at_cutoff <-  sum((preds == 1) & (ds_ts$popularity == 1)) /  sum(preds == 1) # TP / (TP + FP)
actual_prec_at_cutoff
# [1] 0.5294118