如果满足条件,则用第一次出现的值替换NA

时间:2020-03-24 10:20:11

标签: r dplyr conditional-statements na

在data.frame中,如果要满足条件,我想用先前的年龄值“填充” NA。

>x <- data.frame(
    ID = c(1,1,1,1,2,2,2,2,3,3,4,4,4,4),
    YEAR = c(2016,2017,2018,2019,2016,2017,2018,2019,2016,2018,2016,2017,2018,2019),
    AGE = c("ADULT", NA, NA, NA, "ADULT", NA, "ADULT", NA, "JUVENILE", NA, "JUVENILE", "ADULT", NA, NA)
)

>x
   ID YEAR      AGE
1   1 2016   ADULT
2   1 2017     <NA>
3   1 2018     <NA>
4   1 2019     <NA>
5   2 2016   ADULT
6   2 2017     <NA>
7   2 2018   ADULT
8   2 2019     <NA>
9   3 2016 JUVENILE
10  3 2018     <NA>
11  4 2016 JUVENILE
12  4 2017   ADULT
13  4 2018     <NA>
14  4 2019     <NA>

如果是成人,我想用下一个年龄填写下一年的年龄。但是,如果ID首次出现的年龄是JUVENILE,那么我想在接下来的几年中使用ADULT来填充年龄。

我尝试了一些方法,但是没有找到根据第一次出现进行调节的解决方案。

x.age.ok <- x %>% group_by(NUM_PIT, YEAR) %>% fill(AGE, .direction = "down")

我获得了:

>x.age.ok
   ID YEAR      AGE
1   1 2016   ADULT
2   1 2017   ADULT
3   1 2018   ADULT
4   1 2019   ADULT
5   2 2016   ADULT
6   2 2017   ADULT
7   2 2018   ADULT
8   2 2019   ADULT
9   3 2016 JUVENILE
10  3 2018 JUVENILE
11  4 2016 JUVENILE
12  4 2017   ADULT
13  4 2018   ADULT
14  4 2019   ADULT

但是我想要这个(以**突出显示):

>x.age.ok
   ID YEAR      AGE
1   1 2016   ADULT
2   1 2017   ADULT
3   1 2018   ADULT
4   1 2019   ADULT
5   2 2016   ADULT
6   2 2017   ADULT
7   2 2018   ADULT
8   2 2019   ADULT
9   3 2016 JUVENILE
10  3 2018   **ADULT**
11  4 2016 JUVENILE
12  4 2017   ADULT
13  4 2018   ADULT
14  4 2019   ADULT

想法?我们可以将if放在mutate中吗?

1 个答案:

答案 0 :(得分:0)

也许您可以尝试:

library(dplyr)

x %>%
  arrange(ID, YEAR) %>%
  group_by(ID) %>%
  mutate(AGE = if(first(AGE) == "JUVENILE") replace(AGE, is.na(AGE), "ADULT") 
               else replace(AGE, is.na(AGE), first(AGE)))


#      ID  YEAR AGE     
#   <dbl> <dbl> <fct>   
# 1     1  2016 ADULT   
# 2     1  2017 ADULT   
# 3     1  2018 ADULT   
# 4     1  2019 ADULT   
# 5     2  2016 ADULT   
# 6     2  2017 ADULT   
# 7     2  2018 ADULT   
# 8     2  2019 ADULT   
# 9     3  2016 JUVENILE
#10     3  2018 ADULT   
#11     4  2016 JUVENILE
#12     4  2017 ADULT   
#13     4  2018 ADULT   
#14     4  2019 ADULT   

如果first AGE的值为"JUVENILE",我们将NA中的所有"ADULT"值替换为{{1 }}值。