如何使用dplyr按组获取自举分位数估算?

时间:2017-10-15 19:20:32

标签: r dplyr plyr rms

我正在尝试匹配rms包功能' smean.cl.boot'使用dplyr引导置信区间(方法1)。但是,我无法在dplyr调用中引导单列。有人可以告诉我如何重新采样单个列,获取每个组的均值,并最终估计该列的分位数?

请考虑这个小数据集。我在使用plyr包来估计分位数之前总结了分组数据,但由于某种原因,我得到的结果与rms包不同。

require(rms)
require(dplyr)
require(plyr)

fish <- structure(list(wk = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 
5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 
8, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10), pd = c(317.308439683869, 
0, 126.719553152898, NA, NA, NA, NA, 2671.6, 3540.6976744186, 
1270.35740604274, 1067.69362430466, 688.099646524154, 317.444499806234, 
420.941879550524, 280.475476696762, 250.681324772507, 159.048160622895, 
258.125109208457, 450.868907331836, 0, 120.83949704142, 244.794377928162, 
0, 226.610029158717, 0, NA, NA, NA, 0, 0, 776.419523429887, 0, 
0, 5572.7956254272, NA, 0, 235.711495898971, 0, 0, 0, 0, 0, 0, 
158.796322731685, 0, 0, 0, 278.669954021457, 0, 0, 0, 0, 0, 0, 
0, NA, 623.451776649746, 0, 440.704258124564, 0, 69.0758191406588, 
0, 0, 51.2873010185801, 26.8224496254879, 104.366153205662, 0, 
71.1744651415584, 0, 0)), .Names = c("wk", "pd"), row.names = c(NA, 
70L), class = "data.frame")
fish

# Method 1
fish <- na.omit(fish)
x <- data.frame(boot=1:1000) %>%
  group_by(boot) %>% 
  do(sample_n(fish, nrow(fish), replace=TRUE)) %>%
  group_by(boot,wk)
  plyr::ddply(x,'wk',summarise,Mean=mean(pd),lower=quantile(pd,prob=0.025),upper=quantile(pd,prob=0.975))

    wk       Mean      Lower      Upper
1   1  148.00933   0.000000  317.30844
2   2 1425.26210 643.274777 2322.42315
3   3  217.14835 125.537283  304.37517
4   4  117.85110   0.000000  235.70220
5   5 1058.20252   0.000000 2915.80107
6   6   33.67307   0.000000  101.01921
7   7   62.49518   0.000000  142.11517
8   8    0.00000   0.000000    0.00000
9   9  161.89026   9.867974  356.25816
10 10   36.23577  11.133767   66.07178

 #Method 2 
boots <- fish %>%
  group_by(wk) %>%
  do(data.frame(rbind(smean.cl.boot(.$pd))))
  data.frame(boots)
   wk       Mean    lower     upper
1   1  145.71624   0.0000  317.3084
2   2 1490.79383 317.4445 3540.6977
3   3  215.44592   0.0000  450.8689
4   4  124.15618   0.0000  244.7944
5   5  976.88218   0.0000 5572.7956
6   6   27.88334   0.0000  235.7115
7   7   52.79893   0.0000  278.6700
8   8    0.00000   0.0000    0.0000
9   9  165.98724   0.0000  623.4518
10 10   35.66937   0.0000  104.3662

我错过了方法1的一步吗?

0 个答案:

没有答案