我有一个包含4列的数据框 - ID,天,pod和值。
df <- data.frame(ID = rep(1:3, each = 4),
days = c(1, 7, 12, 7, 10, 10, 1, 7, 14, 7, 7, 20),
pod = factor(c("t1", "t2", "t3", "t2", "t2", "t2", "t1", "t2", "t3", "t2", "t2", "t3")),
value = rnorm(12, mean = 0, sd = 1))
每个ID有4个值。对于每个pod时间,我想选择当天最接近以下约定的值:
pod t1 - 第1天; pod t2 - 第7天; pod t3 - 第14天
理想情况下,我想最终得到以下数据框:
ID days pod value
1 1 1 t1 -0.66080611
2 1 7 t2 -1.06817352
3 1 12 t3 -0.50972605
4 1 7 t2 NA
5 2 10 t2 NA
6 2 10 t2 NA
7 2 1 t1 0.32221657
8 2 7 t2 0.96108912
9 3 14 t3 -0.03138917
10 3 7 t2 0.36659820
11 3 7 t2 NA
12 3 20 t3 NA
基本上,我想用NA替换值,如果它不是最接近ID和pod组中的那一天的话。如果组内的日期和时间相等,那么我只想选择第一个。
答案 0 :(得分:1)
我发现你想要在同一天根据顺序替换值,这很容易弄乱。还有什么......可靠的......可以定义哪些值可以省略? 以下代码接近你想要的,但它目前只“用'替换重复的'pods'与第一个的值:
set.seed(1)
dat <- data.frame(ID = rep(1:3, each = 4),
days = c(1, 7, 12, 7, 10, 10, 1, 7, 14, 7, 7, 20),
pod = factor(c("t1", "t2", "t3", "t2", "t2", "t2", "t1", "t2", "t3", "t2", "t2", "t3")),
value = rnorm(12, mean = 0, sd = 1))
dat %>% mutate(helper = case_when(pod == 't1' ~ days-1,
pod == 't2' ~ days-7,
pod == 't3' ~ days-14)) %>%
group_by(ID, pod) %>% mutate(min = ifelse(helper == min(helper),
first(value), NA ))
# A tibble: 12 x 6
# Groups: ID, pod [7]
ID days pod value helper min
<int> <dbl> <fct> <dbl> <dbl> <dbl>
1 1 1.00 t1 -0.626 0 - 0.626
2 1 7.00 t2 0.184 0 0.184
3 1 12.0 t3 -0.836 -2.00 - 0.836
4 1 7.00 t2 1.60 0 0.184
5 2 10.0 t2 0.330 3.00 NA
6 2 10.0 t2 -0.820 3.00 NA
7 2 1.00 t1 0.487 0 0.487
8 2 7.00 t2 0.738 0 0.330
9 3 14.0 t3 0.576 0 0.576
10 3 7.00 t2 -0.305 0 - 0.305
11 3 7.00 t2 1.51 0 - 0.305
12 3 20.0 t3 0.390 6.00 NA
现在添加了另一个条件。这是一些ifelse嵌套,也许不是最优雅,但它给出了你想要的东西,我希望:)
dat %>% mutate(helper = case_when(pod == 't1' ~ days-1,
pod == 't2' ~ days-7,
pod == 't3' ~ days-14)) %>%
group_by(ID, pod) %>% mutate(min = ifelse(helper == min(helper),
ifelse(value == first(value), value, NA ), NA))
# A tibble: 12 x 6
# Groups: ID, pod [7]
ID days pod value helper min
<int> <dbl> <fct> <dbl> <dbl> <dbl>
1 1 1.00 t1 -0.626 0 - 0.626
2 1 7.00 t2 0.184 0 0.184
3 1 12.0 t3 -0.836 -2.00 - 0.836
4 1 7.00 t2 1.60 0 NA
5 2 10.0 t2 0.330 3.00 NA
6 2 10.0 t2 -0.820 3.00 NA
7 2 1.00 t1 0.487 0 0.487
8 2 7.00 t2 0.738 0 NA
9 3 14.0 t3 0.576 0 0.576
10 3 7.00 t2 -0.305 0 - 0.305
11 3 7.00 t2 1.51 0 NA
12 3 20.0 t3 0.390 6.00 NA