在所有指标列中过滤0个计数,并在过滤后在剩余列中添加伪计数?

时间:2014-10-27 09:23:29

标签: r bash shell sh

我想过滤掉我的计数指标的所有列(样本)中对应0计数的行 然后我还想在剩下的列中添加一些整数(伪计数) 主要问题是我有大约36列+ 1个id,并且Icant将此代码包装为我为另一个矩阵为6列所做的。

cat matrix.txt | awk -F "\t" '{if ($2>0 || $3>0 || $4>0 || $5>0 || $6>0 || $7>0 )print $1"\t"$2+1"\t" $3+1"\t"$4+1"\t"$5+1"\t"$6+1"\t"$7+1"\t" }' > final_matrix_nonzero_1pseudoCounts.txt

表示exp:

id          c1   c2  c3   t1  t2  t3
gene1       0   0     1   0   0    1
gene2       0   0   0     0   0   0  #should be removed; gene 2 rows; all columns have 0 in all sample
gene3       1   1     23   45   5   0

然后在剩余的矩阵中添加1(最终矩阵)

     id          c1   c2   c3   t1  t2  t3
gene1            1    1     2     1   1   2
gene3            2    2     24   46   6   1

1 个答案:

答案 0 :(得分:1)

R,您可以

indx <- !grepl("id", colnames(df))

df1 <-df[!!rowSums(df[,indx]),]
df1[,indx] <- df1[,indx]+1

df1
#     id c1 c2 c3 t1 t2 t3
#1 gene1  1  1  2  1  1  2
#3 gene3  2  2 24 46  6  1

数据

df <- structure(list(id = c("gene1", "gene2", "gene3"), c1 = c(0L, 
0L, 1L), c2 = c(0L, 0L, 1L), c3 = c(1L, 0L, 23L), t1 = c(0L, 
0L, 45L), t2 = c(0L, 0L, 5L), t3 = c(1L, 0L, 0L)), .Names = c("id", 
"c1", "c2", "c3", "t1", "t2", "t3"), class = "data.frame", row.names = c(NA, 
-3L))