Question

我有一个名为oct的医疗数据样本数据很大

Providers  ID date ICD
Billy  4504 9/11 f.11
Billy  5090 9/10 r.05
Max   4430  9/01 k.11
Mindy 0812 9/30  f.11 
etc.

我想要每个提供者的ID号的随机样本。我试过了。

review <- oct %>% group_by(Providers) %>% do (sample(oct$ID, size = 5, replace= FALSE, prob = NULL))

Answer 1

使用dplyr::sample_n

的示例

library(dplyr)
set.seed(1)
mtcars %>% group_by(cyl) %>% sample_n(3)

# A tibble: 9 x 11
# Groups:   cyl [3]
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
2  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
3  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
4  19.7     6 145     175  3.62  2.77  15.5     0     1     5     6
5  21       6 160     110  3.9   2.88  17.0     0     1     4     4
6  19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4
7  15       8 301     335  3.54  3.57  14.6     0     1     5     8
8  15.5     8 318     150  2.76  3.52  16.9     0     0     3     2
9  14.7     8 440     230  3.23  5.34  17.4     0     0     3     4

如果您只想选择一个特定变量（问题中的ID）：

set.seed(1)

mtcars %>% 
  group_by(cyl) %>% 
  sample_n(3) %>%
  pull(mpg)

[1] 22.8 32.4 33.9 19.7 21.0 19.2 15.0 15.5 14.7

在R中按组抽样

1 个答案: