计算数据帧中所有可能的行比率

时间:2019-06-05 18:31:14

标签: r

我有一个基因数据行,而基因样本行为列,我想计算行之间的所有可能比率。新的行名应指示从中计算比率的基因。任何提示如何开始?

        sample1 sample2 sample3
gene1   2       23      323
gene2   23      53      56
gene3   565     55      13

退出:

             sample1 sample2 sample3
gene1_gene2  2/23    23/53   323/56
gene1_gene3  2/565   23/55   323/13
gene2_gene3  23/565  53/55   56/13

2 个答案:

答案 0 :(得分:0)

一种选择是使用combn中的base R

out <- do.call(rbind, combn(rownames(df1), 2, FUN = function(x) 
       apply(df1[x, ], 2, paste, collapse="/"), simplify = FALSE))
row.names(out) <- combn(rownames(df1), 2, FUN = paste, collapse="_")
out
#           sample1  sample2 sample3 
#gene1_gene2 "2/23"   "23/53" "323/56"
#gene1_gene3 "2/565"  "23/55" "323/13"
#gene2_gene3 "23/565" "53/55" "56/13" 

如果需要实际值,只需将paste替换为

out <- do.call(rbind, combn(rownames(df1), 2, FUN = function(x) 
   apply(df1[x, ], 2, FUN = function(x) x[1]/x[2]), simplify = FALSE))
row.names(out) <- combn(rownames(df1), 2, FUN = paste, collapse="_")
out
#                sample1   sample2   sample3
#gene1_gene2 0.086956522 0.4339623  5.767857
#gene1_gene3 0.003539823 0.4181818 24.846154
#gene2_gene3 0.040707965 0.9636364  4.307692

数据

df1 <- structure(list(sample1 = c(2L, 23L, 565L), sample2 = c(23L, 53L, 
55L), sample3 = c(323L, 56L, 13L)), class = "data.frame", 
row.names = c("gene1", 
"gene2", "gene3"))

答案 1 :(得分:0)

我认为您可以很好地使用outer函数。如下所示,即使不是很优雅:

library(magrittr) # just for the %>% pipe...
df <- data.frame(gene=c("G1", "G2", "G3"), s1=runif(3), s2=runif(3), s3=runif(3))

dfnew <- data.frame(comb=outer(df$gene, df$gene, paste, sep="_") %>% as.vector, s1ratio=outer(df$s1,df$s1, "/") %>% as.vector)

dfnew

   comb        s1
1 G1_G1 1.0000000
2 G2_G1 2.7052199
3 G3_G1 1.3417876
4 G1_G2 0.3696557
5 G2_G2 1.0000000
6 G3_G2 0.4959995
7 G1_G3 0.7452744
8 G2_G3 2.0161312
9 G3_G3 1.0000000