为什么在使用ifelse / mutate函数时会忽略NA?

时间:2019-12-01 16:10:52

标签: r

因此,我有一个数据帧,其中出现了多个不同种类的事件,并且我想用mutate / ifelse填充一个“ new_name”空列。 基本上,我希望根据以下条件填充new_name: 如果状态为不接受,我希望new_name为“ valid_name”的值,并且如果状态为“接受”或不适用,则我希望new_name采用“ species”的值。 这是数据框结构的一个示例: ´´

数据帧示例

         species           valid_name                 new_name    status
1.  Tilapia guineensis |         NA                 |  NA       | NA

2.     Tilapia zillii  |  Hippocampus trimaculatus  |  NA       | unaccepted

3. Fundulus rubrifrons |  Hippocampus trimaculatus  |  NA       | unaccepted

4.  Eutrigla gurnardus |  Bougainvillia supercili   |  NA       | accepted

5.   Sprattus sprattus |        NA                  |  NA       | NA

6.        Gadus morhua |  Aglantha digitale         |  NA       | accepted

´´´

到目前为止,我尝试了以下操作:

df<-df%>%
  mutate(new_name = ifelse(status=="unaccepted",valid_name,ifelse(status=="accepted" | is.na(status),species,NA)))

因此,此代码仅适用于不具有NA的“状态”值。否则,它只会忽略NA,而不会执行任何操作。这样数据帧就变成了这样的东西:

             species           valid_name                 new_name    status
    1.  Tilapia guineensis |         NA                 |  Tilapia guineensis             | NA

    2.     Tilapia zillii  |  Hippocampus trimaculatus  |  Hippocampus trimaculatus   | unaccepted

    3. Fundulus rubrifrons |  Hippocampus trimaculatus  |  Hippocampus trimaculatus   | unaccepted

    4.  Eutrigla gurnardus |  Bougainvillia supercili   |  Eutrigla gurnardus         | accepted

    5.   Sprattus sprattus |        NA                  |  Sprattus sprattus             | NA

    6.        Gadus morhua |  Aglantha digitale         |  Gadus morhua               | accepted

预先感谢您的回答

2 个答案:

答案 0 :(得分:1)

如果我们使用==,请确保还添加is.na以返回TRUE / FALSE,否则,NA仍为NA

library(dplyr)
df%>%
  mutate(new_name = ifelse(status=="unaccepted" & !is.na(status),valid_name,
           ifelse(status=="accepted" & !is.na(status),species,species)))
#      species               valid_name     status                 new_name
#1  Tilapia guineensis                     <NA>       <NA>       Tilapia guineensis
#2      Tilapia zillii Hippocampus trimaculatus unaccepted Hippocampus trimaculatus
#3 Fundulus rubrifrons Hippocampus trimaculatus unaccepted Hippocampus trimaculatus
#4  Eutrigla gurnardus  Bougainvillia supercili   accepted       Eutrigla gurnardus
#5   Sprattus sprattus                     <NA>       <NA>        Sprattus sprattus
#6        Gadus morhua        Aglantha digitale   accepted             Gadus morhua

另一种选择是使用%in%,它将为NA返回FALSE

df%>%
  mutate(new_name = ifelse(status %in% "unaccepted" ,valid_name,
           ifelse(status %in% "accepted",species, species)))

使用可复制的示例

v1 <- c('a', 'b', NA)
v1 == 'a'
#[1]  TRUE FALSE    NA  ####

v1 %in% 'a'
#[1]  TRUE FALSE FALSE

数据

df <- structure(list(species = c("Tilapia guineensis", "Tilapia zillii", 
"Fundulus rubrifrons", "Eutrigla gurnardus", "Sprattus sprattus", 
"Gadus morhua"), valid_name = c(NA, "Hippocampus trimaculatus", 
"Hippocampus trimaculatus", "Bougainvillia supercili", NA, 
"Aglantha digitale"
), status = c(NA, "unaccepted", "unaccepted", "accepted", NA, 
"accepted")), class = "data.frame", row.names = c(NA, -6L))

答案 1 :(得分:0)

我想使用case_when中的dplyr提供一种替代方法,它提供了一种很好而直观的语法:

library(dplyr)
df <- structure(list(species = c("Tilapia guineensis", "Tilapia zillii", 
                                                                 "Fundulus rubrifrons", "Eutrigla gurnardus", "Sprattus sprattus", 
                                                                 "Gadus morhua"), valid_name = c(NA, "Hippocampus trimaculatus", 
                                                                                                                                "Hippocampus trimaculatus", "Bougainvillia supercili", NA, 
                                                                                                                                "Aglantha digitale"
                                                                 ), status = c(NA, "unaccepted", "unaccepted", "accepted", NA, 
                                                                                            "accepted")), class = "data.frame", row.names = c(NA, -6L))

df <- df %>% 
    mutate(new_name = case_when(
        status == "unaccepted" ~ valid_name,
        status == "accepted" | is.na(status) ~ species
    ))