如果在dplyr中使用三个值之一,我想将变量重新编码为缺失。请考虑以下数据框have
:
id married hrs_workperwk
1 1 40
2 1 55
3 1 70
4 0 -1
5 1 99
6 0 -2
7 0 10
8 0 40
9 1 45
-1,-2和99是非法值。新数据框want
应如下所示:
id married hrs_workperwk
1 1 40
2 1 55
3 1 70
4 0 NA
5 1 NA
6 0 NA
7 0 10
8 0 40
9 1 45
我可以使用base R来快速解决这个问题,但是当我已经使用mutate()
时,dplyr通常很方便。唉,这意味着我目前使用多个嵌套的if_else()
函数:
want <- mutate(have,
hrs_workperwk = if_else(hrs_workperwk < 0, as.numeric(NA),
if_else(hrs_workperwk = 99, as.numeric(NA), hrs_workperwk)))
有没有办法只用一个if_else()函数来做到这一点?理想情况下是这样的:
want <- mutate(have,
hrs_workperwk = if_else(hrs_workperwk = c(-2, -1, 99), as.numeric(NA), hrs_workperwk))
答案 0 :(得分:2)
您可以使用%in%
:
want <- have %>%
mutate(hrs_workperwk = ifelse(hrs_workperwk %in% c(-1, -2, 99), NA, hrs_workperwk))
答案 1 :(得分:2)
我们可以使用replace
df %>%
mutate(hrs_workperwk = replace(hrs_workperwk, hrs_workperwk %in% c(-1, -2, 99), NA))
# id married hrs_workperwk
#1 1 1 40
#2 2 1 55
#3 3 1 70
#4 4 0 NA
#5 5 1 NA
#6 6 0 NA
#7 7 0 10
#8 8 0 40
#9 9 1 45
或另一个选项是case_when
df %>%
mutate(hrs_workperwk = case_when(hrs_workperwk %in% c(-1, -2, 99)~ NA_integer_,
TRUE ~ hrs_workperwk))
答案 2 :(得分:1)
在基地R:
df1$hrs_workperwk[df1$hrs_workperwk %in% c(-1,-2,99)] <- NA
或
is.na(df1$hrs_workperwk) <- df1$hrs_workperwk %in% c(-1,-2,99)
两种情况的输出:
# id married hrs_workperwk
# 1 1 1 40
# 2 2 1 55
# 3 3 1 70
# 4 4 0 NA
# 5 5 1 NA
# 6 6 0 NA
# 7 7 0 10
# 8 8 0 40
# 9 9 1 45
数据强>
df1 <- read.table(text="
id married hrs_workperwk
1 1 40
2 1 55
3 1 70
4 0 -1
5 1 99
6 0 -2
7 0 10
8 0 40
9 1 45",h=T,strin=F)