我有一个数据框:
df <- structure(list(gene = structure(1:6, .Label = c("128up", "14-3-3epsilon",
"14-3-3zeta", "140up", "18SrRNA-Psi:CR41602", "18SrRNA-Psi:CR45861"
), class = "factor"), fpkm = list(NULL, 0.4, NA_real_, NULL,
NULL, NULL)), .Names = c("gene", "fpkm"), row.names = c(NA,
6L), class = "data.frame")
gene fpkm
1 128up NULL
2 14-3-3epsilon 0.4
3 14-3-3zeta NA
4 140up NULL
5 18SrRNA-Psi:CR41602 NULL
6 18SrRNA-Psi:CR45861 NULL
我想根据level
中的值添加新列fpkm
。如果值为NULL
或NA
,我希望值为'not_expressed , else
表达式。
我正在使用mutate
来实现此目的,分别为NA
和NULL
值,但这对NULL
值没有预期效果:
mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))
gene fpkm level
1 128up NULL expressed
2 14-3-3epsilon 0.4 expressed
3 14-3-3zeta NA not_expressed # Expected
4 140up NULL expressed
5 18SrRNA-Psi:CR41602 NULL expressed
6 18SrRNA-Psi:CR45861 NULL expressed
mutate(df, level = ifelse(is.null(fpkm), 'not_expressed' , 'expressed'))
gene fpkm level
1 128up NULL expressed
2 14-3-3epsilon 0.4 expressed
3 14-3-3zeta NA expressed
4 140up NULL expressed
5 18SrRNA-Psi:CR41602 NULL expressed
6 18SrRNA-Psi:CR45861 NULL expressed
我无法弄清楚为什么这不起作用 - is.null(unlist(test$fpkm[1]))
会返回TRUE
我也尝试过:
ifelse(is.null(df$fpkm), 'not_expressed', 'expressed')
和:
ifelse(is.null(unlist(df$fpkm)), 'not_expressed', 'expressed')
......两者都不起作用
答案 0 :(得分:3)
Base R对此非常好:
> df$level <- ifelse( df$fpkm == 'NULL' | is.na(df$fpkm), 'not_expressed', 'expressed')
> df
gene fpkm level
1 128up NULL not_expressed
2 14-3-3epsilon 0.4 expressed
3 14-3-3zeta NA not_expressed
4 140up NULL not_expressed
5 18SrRNA-Psi:CR41602 NULL not_expressed
6 18SrRNA-Psi:CR45861 NULL not_expressed
答案 1 :(得分:1)
最后一句话是您问题的线索。您必须unlist
才能获得正确的结果
unlist(test$fpkm[1])
fpkm
被保存为数据框中的列表(而不是矢量,这是典型的)
'data.frame': 6 obs. of 2 variables:
$ gene: Factor w/ 6 levels "128up","14-3-3epsilon",..: 1 2 3 4 5 6
$ fpkm:List of 6
..$ : NULL
..$ : num 0.4
..$ : num NA
..$ : NULL
..$ : NULL
..$ : NULL
您可以使用
获得正确的结果 mutate(df, level = ifelse(map_lgl(df$fpkm, ~is.null(.x) || is.na(.x)), 'not_expressed' , 'expressed'))
答案 2 :(得分:1)
将 NULL 转换为 NA ,然后使用ifelse
:
# Convert NULL to NA
df$fpkm[ sapply(df$fpkm, is.null) ] <- NA
# I would also drop the list, it is up to you.
# df$fpkm <- unlist(df$fpkm)
# Then use ifelse as usual
df$level <- ifelse(is.na(df$fpkm), "not_expressed", "expressed")
# result
df
# gene fpkm level
# 1 128up NA not_expressed
# 2 14-3-3epsilon 0.4 expressed
# 3 14-3-3zeta NA not_expressed
# 4 140up NA not_expressed
# 5 18SrRNA-Psi:CR41602 NA not_expressed
# 6 18SrRNA-Psi:CR45861 NA not_expressed
答案 3 :(得分:0)
作为替代解决方案,您可以尝试找出列值的长度。 代替:
mutate(df, level = ifelse(is.na(fpkm), 'not_expressed' , 'expressed'))
尝试一下:
library(stringr)
mutate(df, level = if_else(str_length(fpkm)>0, 'expressed','not_expressed'))