如何为每个组设置不同的set.seed(),然后设置sample()

时间:2019-12-04 08:19:06

标签: r dataframe dplyr r-base

我想对分组后的data.frame的Min到Max列中的任何数字进行采样,并且每个组都有不同的种子。我尝试了几种方法,您可以在下面的可重现示例中看到它们,但是它们都不起作用。
data.frame由四列组成:

字母-我的分组变量
种子-一个动态的且特定于组/字母的整数
min -样本()的最小值
最大值-sample()的最大值

以下是可重现的示例:

set.seed(123)
data.frame(letter = sample(letters[1:3],20, replace=TRUE)) %>% 
  group_by(letter) %>% 
  summarise(seed = n()) %>% 
  mutate(min = ifelse(letter == "a", 20,
                      ifelse(letter == "b", 40, 60)),
         max = ifelse(letter == "a", 30,
                      ifelse(letter == "b", 50, 70)))  %>%

  group_by(letter) %>%
  # set.seed(seed) %>%  # or mutate(randomNumber = sample(min:max, 1, set.seed(seed))) # these aren't working, but I hope you get my point 
  mutate(randomNumber = sample(min:max, 1))


非常感谢!

1 个答案:

答案 0 :(得分:1)

我建议您在最后一行的pmap包中使用purrr

library(tidyverse)

set.seed(123)
data.frame(letter = sample(letters[1:3],20, replace=TRUE)) %>% 
  group_by(letter) %>% 
  summarise(seed = n()) %>% 
  mutate(min = ifelse(letter == "a", 20,
                      ifelse(letter == "b", 40, 60)),
         max = ifelse(letter == "a", 30,
                      ifelse(letter == "b", 50, 70)))  %>%

  group_by(letter) %>%
  mutate(randomNumber = pmap_dbl(list(min, max, seed), function(x, y, z){set.seed(z); sample(x:y, 1)}))


# A tibble: 3 x 5
# Groups:   letter [3]
  letter  seed   min   max randomNumber
  <fct>  <int> <dbl> <dbl>        <dbl>
1 a          5    20    30           21
2 b          7    40    50           49
3 c          8    60    70           63