根据r中组内的其他变量重新赋值

时间:2018-03-04 22:19:58

标签: r dplyr

我有一个包含4列的数据框 - ID,天,pod和值。

df <- data.frame(ID = rep(1:3, each = 4),
                 days = c(1, 7, 12, 7, 10, 10, 1, 7, 14, 7, 7, 20),
                 pod = factor(c("t1", "t2", "t3", "t2", "t2", "t2", "t1", "t2", "t3", "t2", "t2", "t3")),
                 value = rnorm(12, mean = 0, sd = 1))

每个ID有4个值。对于每个pod时间,我想选择当天最接近以下约定的值:

pod t1 - 第1天; pod t2 - 第7天; pod t3 - 第14天

理想情况下,我想最终得到以下数据框:

   ID days pod       value
1   1    1  t1 -0.66080611
2   1    7  t2 -1.06817352
3   1   12  t3 -0.50972605
4   1    7  t2          NA
5   2   10  t2          NA
6   2   10  t2          NA
7   2    1  t1  0.32221657
8   2    7  t2  0.96108912
9   3   14  t3 -0.03138917
10  3    7  t2  0.36659820
11  3    7  t2          NA
12  3   20  t3          NA

基本上,我想用NA替换值,如果它不是最接近ID和pod组中的那一天的话。如果组内的日期和时间相等,那么我只想选择第一个。

1 个答案:

答案 0 :(得分:1)

我发现你想要在同一天根据顺序替换值,这很容易弄乱。还有什么......可靠的......可以定义哪些值可以省略? 以下代码接近你想要的,但它目前只“用'替换重复的'pods'与第一个的值:

    set.seed(1)
dat <- data.frame(ID = rep(1:3, each = 4),
                 days = c(1, 7, 12, 7, 10, 10, 1, 7, 14, 7, 7, 20),
                 pod = factor(c("t1", "t2", "t3", "t2", "t2", "t2", "t1", "t2", "t3", "t2", "t2", "t3")),
                 value = rnorm(12, mean = 0, sd = 1))

dat %>% mutate(helper = case_when(pod == 't1' ~ days-1,
                                  pod == 't2' ~ days-7,
                                  pod == 't3' ~ days-14)) %>%
  group_by(ID, pod) %>% mutate(min = ifelse(helper == min(helper), 
                                            first(value), NA ))

# A tibble: 12 x 6
# Groups:   ID, pod [7]
      ID  days pod    value helper     min
   <int> <dbl> <fct>  <dbl>  <dbl>   <dbl>
 1     1  1.00 t1    -0.626   0    - 0.626
 2     1  7.00 t2     0.184   0      0.184
 3     1 12.0  t3    -0.836  -2.00 - 0.836
 4     1  7.00 t2     1.60    0      0.184
 5     2 10.0  t2     0.330   3.00  NA    
 6     2 10.0  t2    -0.820   3.00  NA    
 7     2  1.00 t1     0.487   0      0.487
 8     2  7.00 t2     0.738   0      0.330
 9     3 14.0  t3     0.576   0      0.576
10     3  7.00 t2    -0.305   0    - 0.305
11     3  7.00 t2     1.51    0    - 0.305
12     3 20.0  t3     0.390   6.00  NA

现在添加了另一个条件。这是一些ifelse嵌套,也许不是最优雅,但它给出了你想要的东西,我希望:)

dat %>% mutate(helper = case_when(pod == 't1' ~ days-1,
                                  pod == 't2' ~ days-7,
                                  pod == 't3' ~ days-14)) %>%
  group_by(ID, pod) %>% mutate(min = ifelse(helper == min(helper), 
                                            ifelse(value == first(value), value, NA ), NA))


# A tibble: 12 x 6
# Groups:   ID, pod [7]
      ID  days pod    value helper     min
   <int> <dbl> <fct>  <dbl>  <dbl>   <dbl>
 1     1  1.00 t1    -0.626   0    - 0.626
 2     1  7.00 t2     0.184   0      0.184
 3     1 12.0  t3    -0.836  -2.00 - 0.836
 4     1  7.00 t2     1.60    0     NA    
 5     2 10.0  t2     0.330   3.00  NA    
 6     2 10.0  t2    -0.820   3.00  NA    
 7     2  1.00 t1     0.487   0      0.487
 8     2  7.00 t2     0.738   0     NA    
 9     3 14.0  t3     0.576   0      0.576
10     3  7.00 t2    -0.305   0    - 0.305
11     3  7.00 t2     1.51    0     NA    
12     3 20.0  t3     0.390   6.00  NA