根据另一列过滤数据框的列

时间:2021-04-01 20:30:46

标签: r subset

我需要根据每个 otu 出现的样本数过滤下面的数据框。

   samples otu1 otu2 otu3 otu4 otu5
1        a    2    1    0    0    3
2        b    2    4    1    4    3
3        c    0    0    0    1    0
4        d    0    0    1    4    4
5        e    1    2    0    2    3
6        f    1    1    2    4    2
7        g    1    0    0    4    3
8        h    0    0    2    0    4
9        i    1    2    2    1    6
10       j    0    0    2    3    4

例如,要仅保留出现在 >=80% 样本中的 otu,输出将类似于:

   samples otu4 otu5
1        a    0    3
2        b    4    3
3        c    1    0
4        d    4    4
5        e    2    3
6        f    4    2
7        g    4    3
8        h    0    4
9        i    1    6
10       j    3    4

1 个答案:

答案 0 :(得分:2)

我们可以使用select

library(dplyr)
df1 %>% 
    select(samples, where(~ is.numeric(.) && mean(. != 0) >= 0.8))

-输出

#     samples otu4 otu5
#1        a    0    3
#2        b    4    3
#3        c    1    0
#4        d    4    4
#5        e    2    3
#6        f    4    2
#7        g    4    3
#8        h    0    4
#9        i    1    6
#10       j    3    4

或者如果我们使用旧的 dplyr 版本,请使用 select_if

df1 %>%
   select_if(~ is.character(.)|is.numeric(.) && mean(. != 0) >= 0.8)

数据

df1 <- structure(list(samples = c("a", "b", "c", "d", "e", "f", "g", 
"h", "i", "j"), otu1 = c(2L, 2L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 
0L), otu2 = c(1L, 4L, 0L, 0L, 2L, 1L, 0L, 0L, 2L, 0L), otu3 = c(0L, 
1L, 0L, 1L, 0L, 2L, 0L, 2L, 2L, 2L), otu4 = c(0L, 4L, 1L, 4L, 
2L, 4L, 4L, 0L, 1L, 3L), otu5 = c(3L, 3L, 0L, 4L, 3L, 2L, 3L, 
4L, 6L, 4L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))