如何扩展逻辑回归图?

时间:2019-09-20 10:10:32

标签: r ggplot2 dplyr logistic-regression caret

Here is the plot I created我已经在R上创建了逻辑模型,问题是我的最大x值为0.85,因此绘图停止在该值。

是否可以将其扩展为使用逻辑模型计算的x = 100和y值?

import Vue from 'vue'
import Router from 'vue-router'
...
export default new Router({
routes: [
{
    path: '/', name: 'home', component: Home
},
{
    path: '/sales/users',
    name: 'sales-users',
    component: ()=> import('../components/Sales/AllUsers')
},
{
    path: '/finance/users',
    name: 'finance-users',
    component: ()=> import('../components/Finance/AllUsers')
}

我尝试将x值扩展到100,但只是扩展了轴,但没有计算相应的y值,因此未绘制这些值。

1 个答案:

答案 0 :(得分:3)

我无法复制您的数据,因此,我将使用带有置信区间功能区的“挑战性灾难”示例(请参见此LINK)来演示如何进行处理。

您应该在数据中创建人工点,并在进行绘制之前对其进行拟合。

下次,尝试使用reprex或提供一个可重现的示例。

准备数据和模型拟合:

library(dplyr)

fails <- c(2, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)

temp <- c(53, 66, 68, 70, 75, 78, 57, 67, 69, 70, 75, 79, 58, 67, 70, 72, 76, 80, 63, 67, 70, 73, 76)

challenger <- tibble::tibble(fails, temp)

orings = 6
challenger <- challenger %>%
  dplyr::mutate(resp = fails/orings)

model_fit <- glm(resp ~ temp, 
                 data = challenger, 
                 weights = rep(6, nrow(challenger)),
                 family=binomial(link="logit"))

##### ------- this is what you need: -------------------------------------------

# setting limits for x axis
x_limits <- challenger %>%
  dplyr::summarise(min = 0, max = max(temp)+10)

# creating artificial obs for curve smoothing -- several points between the limits
x <- seq(x_limits[[1]], x_limits[[2]], by=0.5)

# artificial points prediction
# see: https://stackoverflow.com/questions/26694931/how-to-plot-logit-and-probit-in-ggplot2
temp.data = data.frame(temp = x) #column name must be equal to the variable name

# Predict the fitted values given the model and hypothetical data
predicted.data <- as.data.frame(
  predict(model_fit, 
          newdata = temp.data, 
          type="link", se=TRUE)
  )

# Combine the hypothetical data and predicted values
new.data <- cbind(temp.data, predicted.data)
##### --------------------------------------------------------------------------

# Compute confidence intervals
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- model_fit$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- model_fit$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- model_fit$family$linkinv(new.data$fit)  # Rescale to 0-1

绘图:


library(ggplot2)

plotly_palette <- c('#1F77B4', '#FF7F0E', '#2CA02C', '#D62728')

p <- ggplot(challenger, aes(x=temp, y=resp))+ 
  geom_point(colour = plotly_palette[1])+ 
  geom_ribbon(data=new.data, 
              aes(y=fit, ymin=ymin, ymax=ymax), 
              alpha = 0.5, 
              fill = '#FFF0F5')+
  geom_line(data=new.data, aes(y=fit), colour = plotly_palette[2]) + 
  labs(x="Temperature", y="Estimated Fail Probability")+
  ggtitle("Predicted Probabilities for fail/orings with 95% Confidence Interval")+
  theme_bw()+
  theme(panel.border = element_blank(), plot.title = element_text(hjust=0.5))

p

# if you want something fancier:
# library(plotly)
# ggplotly(p)

结果:

enter image description here

有关挑战者数据的有趣事实:

NASA工程师使用线性回归来估计O形圈失效的可能性。如果他们对数据使用更合适的技术(例如逻辑回归),他们会注意到在较低温度(例如发射时的〜36F)下发生故障的可能性非常高。该图表明,对于〜36F(根据观察到的温度推断的温度),概率为〜0.75。如果我们考虑置信区间...那么,事故几乎可以肯定。