折叠在变量名称中加倍

时间:2015-05-18 11:27:11

标签: r

我有一个基因名称矩阵,其表达值在不同的组织中。然而,分析是独立进行的,并非所有基因都存在于所有组织中。将每个组织的基因列表简单地粘贴在彼此之下。现在它看起来像这样:

 GeneName   Tissue A Tissue B
Gene A  1------------
Gene B  1------------
Gene C  2-----------
Gene A ---------3
Gene D ----------2

我想折叠基因名称倍数,以便得到如下矩阵:

GeneName   Tissue A Tissue B
Gene A 1---------3
Gene B 1---------
Gene C 2----------
Gene D ---------2

编辑:谢谢你的回答。但是,我错过了添加基因名称是他们自己的列,而行名称只是数字1-n。我尝试将名称列设置为行名row.names(mydataframe)<-mydataframe$GeneName,但收到以下错误消息Error in row.names&lt; - .data.frame ( tmp { {1}} 据我所知,我不能使用具有非唯一值的列作为行名称,如果我需要在基因名称列之后命名行以便能够折叠矩阵,这似乎会让我陷入catch-22?

1 个答案:

答案 0 :(得分:3)

假设缺失值为'NA'且'Gene D'输出中的'Tissue.B'值为2,您可以使用

 res <- rowsum(m1, row.names(m1), na.rm=TRUE)
 is.na(res) <- res==0
 res
 #       Tissue.A Tissue.B
 #Gene A        1        3
 #Gene B        1       NA
 #Gene C        2       NA
 #Gene D       NA        2

如果是带有'GeneName'作为列

的data.frame
 library(dplyr)
 df1 %>%
    group_by(GeneName) %>% 
    summarise_each(funs(sum=sum(., na.rm=TRUE)))
 #    GeneName Tissue.A Tissue.B
 #1   Gene A        1        3
 #2   Gene B        1        0
 #3   Gene C        2        0
 #4   Gene D        0        2

我们可以像以前一样用0替换NA

或使用aggregate

中的base R
  aggregate(.~GeneName, df1, sum, na.rm=TRUE, na.action=NULL)

数据

 m1 <- structure(c(1L, 1L, 2L, NA, NA, NA, NA, NA, 3L, 2L), .Dim = c(5L, 
 2L), .Dimnames = list(c("Gene A", "Gene B", "Gene C", "Gene A", 
"Gene D"), c("Tissue.A", "Tissue.B")))

 df1 <- structure(list(GeneName = c("Gene A", "Gene B", "Gene C",
  "Gene A", 
 "Gene D"), Tissue.A = c(1L, 1L, 2L, NA, NA), Tissue.B = c(NA, 
 NA, NA, 3L, 2L)), .Names = c("GeneName", "Tissue.A", "Tissue.B"
 ), class = "data.frame", row.names = c(NA, -5L))