使用ifelse和is.null来测试NULL值

时间:2017-09-20 12:35:28

标签: r plyr

我有一个数据框:

df <- structure(list(gene = structure(1:6, .Label = c("128up", "14-3-3epsilon", 
"14-3-3zeta", "140up", "18SrRNA-Psi:CR41602", "18SrRNA-Psi:CR45861"
), class = "factor"), fpkm = list(NULL, 0.4, NA_real_, NULL, 
    NULL, NULL)), .Names = c("gene", "fpkm"), row.names = c(NA, 
6L), class = "data.frame")
                 gene fpkm
1               128up NULL
2       14-3-3epsilon  0.4
3          14-3-3zeta   NA
4               140up NULL
5 18SrRNA-Psi:CR41602 NULL
6 18SrRNA-Psi:CR45861 NULL

我想根据level中的值添加新列fpkm。如果值为NULLNA,我希望值为'not_expressed , else表达式。

我正在使用mutate来实现此目的,分别为NANULL值,但这对NULL值没有预期效果:

mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))

                 gene fpkm         level
1               128up NULL     expressed
2       14-3-3epsilon  0.4     expressed
3          14-3-3zeta   NA not_expressed # Expected
4               140up NULL     expressed
5 18SrRNA-Psi:CR41602 NULL     expressed
6 18SrRNA-Psi:CR45861 NULL     expressed
  mutate(df, level = ifelse(is.null(fpkm), 'not_expressed' , 'expressed'))

                 gene fpkm     level
1               128up NULL expressed
2       14-3-3epsilon  0.4 expressed
3          14-3-3zeta   NA expressed
4               140up NULL expressed
5 18SrRNA-Psi:CR41602 NULL expressed
6 18SrRNA-Psi:CR45861 NULL expressed

我无法弄清楚为什么这不起作用 - is.null(unlist(test$fpkm[1]))会返回TRUE

我也尝试过: ifelse(is.null(df$fpkm), 'not_expressed', 'expressed')

和: ifelse(is.null(unlist(df$fpkm)), 'not_expressed', 'expressed')

......两者都不起作用

4 个答案:

答案 0 :(得分:3)

Base R对此非常好:

> df$level <-  ifelse( df$fpkm == 'NULL' | is.na(df$fpkm), 'not_expressed', 'expressed')
> df
                 gene fpkm         level
1               128up NULL not_expressed
2       14-3-3epsilon  0.4     expressed
3          14-3-3zeta   NA not_expressed
4               140up NULL not_expressed
5 18SrRNA-Psi:CR41602 NULL not_expressed
6 18SrRNA-Psi:CR45861 NULL not_expressed

答案 1 :(得分:1)

最后一句话是您问题的线索。您必须unlist才能获得正确的结果

unlist(test$fpkm[1])

fpkm被保存为数据框中的列表(而不是矢量,这是典型的)

'data.frame':   6 obs. of  2 variables:
 $ gene: Factor w/ 6 levels "128up","14-3-3epsilon",..: 1 2 3 4 5 6
 $ fpkm:List of 6
  ..$ : NULL
  ..$ : num 0.4
  ..$ : num NA
  ..$ : NULL
  ..$ : NULL
  ..$ : NULL

您可以使用

获得正确的结果
  mutate(df, level = ifelse(map_lgl(df$fpkm, ~is.null(.x) || is.na(.x)), 'not_expressed' , 'expressed'))

答案 2 :(得分:1)

NULL 转换为 NA ,然后使用ifelse

# Convert NULL to NA
df$fpkm[ sapply(df$fpkm, is.null) ] <- NA

# I would also drop the list, it is up to you.
# df$fpkm <- unlist(df$fpkm)

# Then use ifelse as usual
df$level <-  ifelse(is.na(df$fpkm), "not_expressed", "expressed")

# result
df
#                  gene fpkm         level
# 1               128up   NA not_expressed
# 2       14-3-3epsilon  0.4     expressed
# 3          14-3-3zeta   NA not_expressed
# 4               140up   NA not_expressed
# 5 18SrRNA-Psi:CR41602   NA not_expressed
# 6 18SrRNA-Psi:CR45861   NA not_expressed

答案 3 :(得分:0)

作为替代解决方案,您可以尝试找出列值的长度。 代替:

mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))

尝试一下:

library(stringr)
mutate(df, level = if_else(str_length(fpkm)>0, 'expressed','not_expressed'))