我想过滤掉我的计数指标的所有列(样本)中对应0计数的行 然后我还想在剩下的列中添加一些整数(伪计数) 主要问题是我有大约36列+ 1个id,并且Icant将此代码包装为我为另一个矩阵为6列所做的。
cat matrix.txt | awk -F "\t" '{if ($2>0 || $3>0 || $4>0 || $5>0 || $6>0 || $7>0 )print $1"\t"$2+1"\t" $3+1"\t"$4+1"\t"$5+1"\t"$6+1"\t"$7+1"\t" }' > final_matrix_nonzero_1pseudoCounts.txt
表示exp:
id c1 c2 c3 t1 t2 t3
gene1 0 0 1 0 0 1
gene2 0 0 0 0 0 0 #should be removed; gene 2 rows; all columns have 0 in all sample
gene3 1 1 23 45 5 0
然后在剩余的矩阵中添加1(最终矩阵)
id c1 c2 c3 t1 t2 t3
gene1 1 1 2 1 1 2
gene3 2 2 24 46 6 1
答案 0 :(得分:1)
在R
,您可以
indx <- !grepl("id", colnames(df))
df1 <-df[!!rowSums(df[,indx]),]
df1[,indx] <- df1[,indx]+1
df1
# id c1 c2 c3 t1 t2 t3
#1 gene1 1 1 2 1 1 2
#3 gene3 2 2 24 46 6 1
df <- structure(list(id = c("gene1", "gene2", "gene3"), c1 = c(0L,
0L, 1L), c2 = c(0L, 0L, 1L), c3 = c(1L, 0L, 23L), t1 = c(0L,
0L, 45L), t2 = c(0L, 0L, 5L), t3 = c(1L, 0L, 0L)), .Names = c("id",
"c1", "c2", "c3", "t1", "t2", "t3"), class = "data.frame", row.names = c(NA,
-3L))