Question

我正在使用列high.medv（是/否）对波士顿数据进行逻辑回归，这表明列medv给出的房价中位数是否大于25。

以下是我的逻辑回归代码。

high.medv <- ifelse(Boston$medv>25, "Y", "N") # Applying the desired

`条件调解并将结果存储到名为＆＃34; medv.high＆＃34;

的新变量中

ourBoston <- data.frame (Boston, high.medv)
ourBoston$high.medv <- as.factor(ourBoston$high.medv)
attach(Boston)
# 70% of data <- Train
train2<- subset(ourBoston,sample==TRUE)
# 30% will be Test
test2<- subset(ourBoston, sample==FALSE)
glm.fit <- glm (high.medv ~ lstat,data = train2, family = binomial)
summary(glm.fit)

输出如下：

Deviance Residuals: 
[1]  0

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -22.57   48196.14       0        1
lstat             NA         NA      NA       NA

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 0.0000e+00  on 0  degrees of freedom
Residual deviance: 3.1675e-10  on 0  degrees of freedom
AIC: 2

Number of Fisher Scoring iterations: 21

我还需要：现在我需要使用错误分类率作为两种情况下的错误度量：

使用lstat作为预测器，

使用除high.medv和medv之外的所有预测变量。但我坚持回归本身

Answer 1

对于每种分类算法，本领域都依赖于选择阈值，您可以根据该阈值确定结果是positive还是negative。

当您在predict数据集中test结果时，您估计响应变量的概率为1或0.因此，您需要告诉您要切割的位置，{{ 1}}，预测变为1或0。

高阈值对于将病例标记为阳性更为保守，这使得它不太可能产生假阳性并且更可能产生假阴性。低阈值则相反。

通常的程序是绘制您感兴趣的费率，例如，相互之间的真阳性和误报，然后选择最适合您的费率。

threshold

要使用missclassification错误评估模型，首先需要设置阈值。为此，您可以使用set.seed(666) # simulation of logistic data x1 = rnorm(1000) # some continuous variables z = 1 + 2*x1 # linear combination with a bias pr = 1/(1 + exp(-z)) # pass through an inv-logit function y = rbinom(1000, 1, pr) df = data.frame(y = y, x1 = x1) df$train = 0 df$train[sample(1:(2*nrow(df)/3))] = 1 df$new_y = NA # modelling the response variable mod = glm(y ~ x1, data = df[df$train == 1,], family = "binomial") df$new_y[df$train == 0] = predict(mod, newdata = df[df$train == 0,], type = 'response') # predicted probabilities dat = df[df$train==0,] # test data包中的roc函数，该函数计算费率并提供相应的阈值：

pROC

library(pROC)

rates =roc(dat$y, dat$new_y)
plot(rates) # visualize the trade-off

模型的准确性可以计算为您获得的观察数量与样本大小的比率。

不正确的逻辑回归输出

1 个答案: