R - dplyr bootstrap问题

时间:2016-09-17 16:09:21

标签: r dplyr statistics-bootstrap broom

我有一个问题,了解如何正确使用dplyr bootstrap功能。

我想要的是从两个随机分配的组生成一个bootstrap分布,并计算平均值的差异,例如:

library(dplyr) 
library(broom) 
data(mtcars) 

mtcars %>% 
  mutate(treat = sample(c(0, 1), 32, replace = T)) %>% 
  group_by(treat) %>%
  summarise(m = mean(disp)) %>% 
  summarise(m = m[treat == 1] - m[treat == 0])

问题是我需要重复此操作1001000或更多次。

使用replicate,我可以

frep = function(mtcars) mtcars %>% 
  mutate(treat = sample(c(0, 1), 32, replace = T)) %>% 
  group_by(treat) %>%
  summarise(m = mean(disp)) %>% 
  summarise(m = m[treat == 1] - m[treat == 0])

replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()

并获得分发

enter image description here

我真的不知道如何在这里使用bootstrap。我应该怎么开始?

mtcars %>% 
  bootstrap(10) %>% 
  mutate(treat = sample(c(0, 1), 32, replace = T)) 

mtcars %>% 
  bootstrap(10) %>% 
  do(tidy(treat = sample(c(0, 1), 32, replace = T))) 

它并没有真正起作用。我应该在哪里放bootstrap点?

感谢。

1 个答案:

答案 0 :(得分:2)

do步骤中,我们使用data.frame打包并创建'处理'专栏,然后我们可以按“复制”进行分组。并且'对待'获取summarise d输出列

mtcars %>% 
    bootstrap(10) %>% 
    do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>% 
    group_by(replicate, treat) %>% 
    summarise(m = mean(disp)) %>%
    summarise(m = m[treat == 1] - m[treat == 0])
    #or as 1 occurs second and 0 second, we can also use
    #summarise(m = last(m) - first(m))