我需要根据分组来填充基于先前和/或前向值的缺失值。我想用dplyr完成这个任务(尽管data.table解决方案也会受到欢迎)。
示例数据:
testing <- tibble(key = c(10,10,10,10,10,10,20,20,20,20,20,20),
year = c(15,15,16,16,17,17,15,15,16,16,17,17),
name = c("abc","abc","","","dfg","dfg",
"","","nmm","nmm","",""),
is_name = c(1,1,0,0,1,1,0,0,0,0,0,0))
key year name is_name
<dbl> <dbl> <chr> <dbl>
1 10 15 abc 1
2 10 15 abc 1
3 10 16 0
4 10 16 0
5 10 17 dfg 1
6 10 17 dfg 1
7 20 15 0
8 20 15 0
9 20 16 nmm 0
10 20 16 nmm 0
11 20 17 0
12 20 17 0
我希望填写缺少的名称(name
),如果同一year
的前key
标记为is_name==1
,则填写缺失的名称它。
所以输出可以是:
key year name is_name name_new
<dbl> <dbl> <chr> <dbl> <chr>
1 10 15 abc 1 abc
2 10 15 abc 1 abc
3 10 16 0 abc
4 10 16 0 abc
5 10 17 dfg 1 dfg
6 10 17 dfg 1 dfg
7 20 15 0
8 20 15 0
9 20 16 nmm 0 nmm
10 20 16 nmm 0 nmm
11 20 17 0
12 20 17 0
我尝试使用lag
和leap
,但它没有正确地超越群组(key
)。
谢谢!
答案 0 :(得分:1)
这可能适合你
library(dplyr)
library(zoo)
testing <- testing %>%
arrange(key, year) %>%
mutate(name = ifelse(name == "", NA, name),
is_name = ifelse(is_name == 0, NA, is_name)) %>%
group_by(key) %>%
mutate(newname = ifelse((is.na(name) & na.locf(is_name, na.rm = FALSE) == 1), na.locf(name, na.rm = FALSE), name),
is_name = ifelse(is.na(is_name),0,is_name))