我有一个长数据集,它由多个插补产生的几个数据集组成(假设有10个插补)。他们有一个识别插补的id变量。在每个这些估算的数据集上,我想引导10个数据集。在引导程序之后,我想在每个上运行模型(100,插补靴子组合)。
在此示例中,我不确定是使用broom::bootstrap()
函数还是modelr::bootstrap()
函数。此外,分组似乎在我的管道中丢失了。
以下是使用mtcars数据集的可重现示例:
library(tidyverse)
library(broom)
cars <- mtcars %>%
mutate(am = as.factor(am)) %>% # This is standing in for my imputation id variable
group_by(am)
Source: local data frame [32 x 11]
Groups: am [2]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
正如您所看到的,输出当前显示有两个组,应该如此。在我的数据集中,它会显示每个插补数据集有10个。现在:
cars2 <- cars %>%
broom::bootstrap(10, by_group = TRUE)
cars2
Source: local data frame [32 x 11]
Groups: replicate [10]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
现在看起来好像只有10个组代表每个复制品。它似乎没有保留先前的分组。在这一点上,我预计会有20个小组(2 x 10)。
如果我现在这样做:
cars3 <- cars2 %>%
group_by(am)
cars3
Source: local data frame [32 x 11]
Groups: am [2]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
现在似乎只复制了am
的组。
无论如何,在 之后,我已经对原始数据集进行了分组。另外,理想情况下,在我引导后,应该有一个id,指示我正在查看哪些bootrapped数据集。
在我的理想世界中,我的代码应该能够做到这样的事情:
cars <- mtcars %>%
mutate(am = as.factor(am)) %>%
group_by(am) %>%
bootstrap(10, by_group = TRUE) %>%
nest() %>% # create a condensed tidy dataset that has one row per imputation, bootstrap combo
mutate(model = map(data, ~lm(mpg~, data = .)) # Create a model for each row
答案 0 :(得分:4)
我正在尝试同时学习# This may take some tweaking, but here's the general idea
@api.multi
def copy(self, vals):
new_product = super(YourClass, self).copy(vals)
if vals.get('copy_sellers'):
new_product.seller_ids = self.seller_ids.copy({'product_id': new_product.id})
return new_product
# Whatever you have calling the copy method will need to include copy_sellers in vals
vals.update({'copy_sellers': True})
product.copy(vals)
和modelr
并且他们真的让我头疼。我想我终于想出了这个。
purrr
library(modelr)
library(dplyr)
library(tidyr)
library(broom)
mtcars %>% group_by(am) %>% do(rs = modelr::bootstrap(., 10))
Source: local data frame [2 x 2]
Groups: <by row>
# A tibble: 2 x 2
am rs
* <dbl> <list>
1 0 <tibble [10 x 2]>
2 1 <tibble [10 x 2]>
mtcars %>% group_by(am) %>% do(rs = modelr::bootstrap(., 10)) %>% group_by(am) %>% unnest
您必须在表带列上使用# A tibble: 20 x 3
# Groups: am [2]
am strap .id
<dbl> <list> <chr>
1 0 <S3: resample> 01
2 0 <S3: resample> 02
3 0 <S3: resample> 03
4 0 <S3: resample> 04
5 0 <S3: resample> 05
6 0 <S3: resample> 06
7 0 <S3: resample> 07
8 0 <S3: resample> 08
9 0 <S3: resample> 09
10 0 <S3: resample> 10
11 1 <S3: resample> 01
12 1 <S3: resample> 02
13 1 <S3: resample> 03
14 1 <S3: resample> 04
15 1 <S3: resample> 05
16 1 <S3: resample> 06
17 1 <S3: resample> 07
18 1 <S3: resample> 08
19 1 <S3: resample> 09
20 1 <S3: resample> 10
将其重新展开为可用数据。见as.data.frame
。这个让我永远想通了。它应该正常工作,如?resample
。
tidyr::unnest
mtcars %>% group_by(am) %>% do(rs = modelr::bootstrap(., 10)) %>% group_by(am) %>% unnest %>% group_by(am, .id) %>% do(model = lm(mpg~wt, data = as.data.frame(.$strap)))
Source: local data frame [20 x 3]
Groups: <by row>
# A tibble: 20 x 3
am .id model
* <dbl> <chr> <list>
1 0 01 <S3: lm>
2 0 02 <S3: lm>
3 0 03 <S3: lm>
4 0 04 <S3: lm>
5 0 05 <S3: lm>
6 0 06 <S3: lm>
7 0 07 <S3: lm>
8 0 08 <S3: lm>
9 0 09 <S3: lm>
10 0 10 <S3: lm>
11 1 01 <S3: lm>
12 1 02 <S3: lm>
13 1 03 <S3: lm>
14 1 04 <S3: lm>
15 1 05 <S3: lm>
16 1 06 <S3: lm>
17 1 07 <S3: lm>
18 1 08 <S3: lm>
19 1 09 <S3: lm>
20 1 10 <S3: lm>
mtcars %>% group_by(am) %>% do(rs = modelr::bootstrap(., 10)) %>% group_by(am) %>% unnest %>% group_by(am, .id) %>% do(model = lm(mpg~wt, data = as.data.frame(.$strap))) %>% tidy(model)
请注意,我将bootstraps的数量增加到1000,大约需要10秒。
# A tibble: 40 x 7
# Groups: am, .id [20]
am .id term estimate std.error statistic p.value
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 0 01 (Intercept) 25.800592 2.1055145 12.253818 7.300379e-10
2 0 01 wt -2.608827 0.5377694 -4.851201 1.497729e-04
3 0 02 (Intercept) 37.012664 4.7369213 7.813654 5.023424e-07
4 0 02 wt -5.272094 1.2884870 -4.091693 7.602571e-04
5 0 03 (Intercept) 26.145563 2.2114832 11.822637 1.263234e-09
6 0 03 wt -2.428845 0.5541412 -4.383080 4.056524e-04
7 0 04 (Intercept) 31.502481 4.0753463 7.730013 5.806324e-07
8 0 04 wt -3.584863 1.1510368 -3.114464 6.305972e-03
9 0 05 (Intercept) 31.739921 2.2216473 14.286661 6.690920e-11
10 0 05 wt -3.716515 0.5627808 -6.603841 4.471168e-06
# ... with 30 more rows