如何在ggplot2中绘制自举斜率的向量?

时间:2018-10-12 21:48:57

标签: r ggplot2 statistics

我一直在使用ggplot2来绘制引导各种统计输出(例如相关系数)的结果。最近,我引导了线性回归模型的斜率。使用plot()包中的graphics函数的外观如下:

plot(main="Relationship Between Eruption Length at Wait Time at \n 
 Old Faithful With Bootstrapped Regression Lines", 
 xlab = "Eruption Length (minutes)", 
 ylab = "Wait Time (minutes)", 
 waiting ~ eruptions, 
 data = faithful, 
 col = spot_color, 
 pch = 19)

index <- 1:nrow(faithful)
for (i in 1:10000) {
    index_boot <- sample(index, replace = TRUE) #getting a boostrap sample (of indices) 
    faithful_boot <- faithful[index_boot, ]
    # Fitting the linear model to the bootstrapped data:
    fit.boot <- lm(waiting ~ eruptions, data = faithful_boot)
    abline(fit.boot, lwd = 0.1, col = rgb(0, 0.1, 0.25, alpha = 0.05)) # Add line to plot
}
fit <- lm(waiting ~ eruptions, data=faithful)
abline(fit, lwd = 2.5, col = "blue")

这可行,但要取决于工作流程,在该工作流程中,我们首先创建一个绘图,然后将这些线添加到循环中。我宁愿使用函数创建一个坡度列表,然后在ggplot2中绘制所有坡度。

例如,该函数可能看起来像这样:

set.seed(777) # included so the following output is reproducible
n_resample <- 10000 # set the number of times to resample the data

# First argument is the data; second is the number of resampled datasets
bootstrap <- function(df, n_resample) {
    slope_resample <- matrix(NA, nrow = n_resample) # initialize vector 
    index <- 1:nrow(df) # create an index for supplied table

    for (i in 1:n_resample) {
        index_boot <- sample(index, replace = TRUE) # sample row numbers, with replacement
        df_boot <- df[index_boot, ] # create a bootstrap sample from original data
        a <- lm(waiting ~ eruptions, data=df_boot) # compute linear model
        slope_resample[i] <- slope <- a$coefficients[2] # take the slope
    }
    return(slope_resample) # Return a vector of differences of proportion
}

bootstrapped_slopes <- bootstrap(faithful, 10000)

但是如何获取geom_line()geom_smooth()以从bootstrapped_slopes获取数据?非常感谢您的协助。

1 个答案:

答案 0 :(得分:1)

编辑:来自OP的更直接的修改

对于绘图,我想您既需要斜率又需要截距,因此这是一个经过修改的bootstrap函数:

bootstrap <- function(df, n_resample) {
  # Note 2 dimensions here, for slope and intercept
  slope_resample <- matrix(NA, 2, nrow = n_resample) # initialize vector 
  index <- 1:nrow(df) # create an index for supplied table

  for (i in 1:n_resample) {
    index_boot <- sample(index, replace = TRUE) # sample row numbers, with replacement
    df_boot <- df[index_boot, ] # create a bootstrap sample from original data
    a <- lm(waiting ~ eruptions, data=df_boot) # compute linear model
    slope_resample[i, 1] <- slope <- a$coefficients[1] # take the slope
    slope_resample[i, 2] <- intercept <- a$coefficients[2] # take the intercept
  }
  # Return a data frame with all the slopes and intercepts 
  return(as.data.frame(slope_resample))      
}

然后运行它并绘制该数据框中的线:

bootstrapped_slopes <- bootstrap(faithful, 10000)

library(dplyr); library(ggplot2)
ggplot(faithful, aes(eruptions, waiting)) +
  geom_abline(data = bootstrapped_slopes %>% 
                sample_n(1000), # 10k lines look about the same as 1k, just darker and slower
              aes(slope =  V2, intercept = V1), #, group = id), 
              alpha = 0.01) +
  geom_point(shape = 19, color = "red")

替代解决方案

这也可以使用modelrbroom来简化一些引导程序。根据{{​​1}}的主要帮助示例,我们可以执行以下操作:

modelr::bootstrap

enter image description here