Question

例如，我有一个包含许多基因列和行的数据框

id  treatment   time    gene1   gene2   gene3   …
1   A   1   2   0   2   …
2   A   2   0   0   3   …
3   A   3   0   0   4   …
4   B   4   0   0   0   …
5   B   5   0   0   2   …
6   B   3   1   0   1   …
7   C   5   0   0   2   …

我想保留所有列（在示例数据中为三列，但实际数据中有许多列），而基因列的总和大于0。

我很乐意为您提供帮助。非常感谢你！

Answer 1

对于要保留条目总数大于0的行还是列，我不确定。

对于前者，您可以像这样使用rowSums

df[rowSums(df[, grep("gene", names(df))]) > 0, ]
#id treatment time gene1 gene2 gene3
#1  1         A    1     2     0     2
#2  2         A    2     0     0     3
#3  3         A    3     0     0     4
#5  5         B    5     0     0     2
#6  6         B    3     1     0     1
#7  7         C    5     0     0     2

或者仅保留条目总数大于0的那些列，可以使用colSums

df[, names(df) %in% c(
    names(df)[grep("gene", names(df), invert = T)],
    names(which(colSums(df[, grep("gene", names(df))]) > 0)))]
#  id treatment time gene1 gene3
#1  1         A    1     2     2
#2  2         A    2     0     3
#3  3         A    3     0     4
#4  4         B    4     0     0
#5  5         B    5     0     2
#6  6         B    3     1     1
#7  7         C    5     0     2

这假定所有基因列均包含单词"gene"（所有非基因列均不包含单词"gene"）。

或更简洁（感谢@Shree），

df[, c(rep(T, 3), colSums(df[, -c(1:3)]) > 0)]

假设前3列是非基因列（其余列都是基因列）。

样本数据

df <- read.table(text =
    "id  treatment   time    gene1   gene2   gene3
1   A   1   2   0   2
2   A   2   0   0   3
3   A   3   0   0   4
4   B   4   0   0   0
5   B   5   0   0   2
6   B   3   1   0   1
7   C   5   0   0   2", header = T)

根据总和选择列

1 个答案:

样本数据