R:将列表重组为矩阵

时间:2012-04-03 15:13:58

标签: r

我有一个像my.list[[file.id]][value.id]]<-a value(1 or 0)这样的结构列表。相同的value.id可以存在于不同的file.ids中。

我需要一个带有rownames的矩阵作为所有value.ids,colnames是file.ids,每个单元格都是my.list[[file.id]][[value.id]]

有没有快速的方法可以做到这一点而不像疯了一样迭代?

示例数据:

列表:

$`Zhou_et_al_2004`
  CDC42:P60953   CDK2D:NONAME  MAPK12:P53778    E2F3:NONAME    GRB2:P62424    GRB2:P62993     RFA:NONAME 
          "up"           "up"         "down"         "down"         "down"         "down"         "down" 
   CDK9:P50750 JUP/DP3:NONAME    MEK1:NONAME   RFC38:NONAME     DP2:NONAME   RFC37:NONAME  GADD45:NONAME 
        "down"         "down"         "down"         "down"         "down"         "down"         "down" 

$`Zhou_et_al_2006`
   CTTN:Q14247   GTSE1:Q9NYZ3     CHST11:Q9N     CHST11:PF2  TNRC6A:Q8NDV7    MMP9:P14780      NRIP3:Q9N 
          "up"           "up"           "up"           "up"           "up"           "up"           "up" 
     NRIP3:Q35    EGFR:P00533   GFPT2:NONAME   TPCN2:Q8NHX9     BBP:NONAME    SQLE:Q14534   DISP2:NONAME 
          "up"           "up"           "up"           "up"           "up"           "up"           "up" 
  PAPPA:Q13219    BMP2:P12643    PCM1:Q15154  SUCLG2:Q96I99   ASAH1:Q13510  UQCRC2:P22695   MTUS1:NONAME 
          "up"           "up"         "down"         "down"         "down"         "down"         "down" 
  MUC20:NONAME   FRAT2:NONAME PLA2G4A:P47712 
        "down"         "down"         "down" 

$`Zhou_et_al_2007`
    CTTN:Q14247    GTSE1:Q9NYZ3      CHST11:Q9N      CHST11:PF2   TNRC6A:Q8NDV7       NRIP3:Q9N 
           "up"            "up"            "up"            "up"            "up"            "up" 
      NRIP3:Q35    USP32:Q8NFA0  PPFIBP1:Q86W92   MALAT1:NONAME    TRA2A:NONAME MGC17624:NONAME 
           "up"            "up"            "up"            "up"            "up"            "up" 
  SLC6A2:P23975    USP42:Q9H9J4    RASEF:NONAME   SEMA3C:Q99985     NDE1:Q9NXR1     TRA1:NONAME 
           "up"            "up"            "up"            "up"            "up"            "up" 
  PPFIA1:Q13136   PPFIA1:Q16787    ITGA9:Q13797    ITGA9:Q14469     LMO2:P25791    NR2F2:P24468 
           "up"            "up"          "down"          "down"          "down"          "down" 
KIAA0882:NONAME     PCM1:Q15154     CYB5:NONAME     IDH1:NONAME    MYLIP:Q8WY64    ASAH1:Q13510 
         "down"          "down"          "down"          "down"          "down"          "down" 
  HADHSC:NONAME   FAM84B:Q96KN1     ADH5:P11766     NTN4:Q9HB63      AK3:Q9UIJ7    MTUS1:NONAME 
         "down"          "down"          "down"          "down"          "down"          "down" 
KIAA1815:NONAME 
         "down" 

MATRIX:

                Zhou2004 Zhou2006 Zhou2007
CDC42:P60953    "up"     NA       NA      
CDK2D:NONAME    "up"     NA       NA      
MAPK12:P53778   "down"   NA       NA      
E2F3:NONAME     "down"   NA       NA      
GRB2:P62424     "down"   NA       NA      
GRB2:P62993     "down"   NA       NA      
RFA:NONAME      "down"   NA       NA      
CDK9:P50750     "down"   NA       NA      
JUP/DP3:NONAME  "down"   NA       NA      
MEK1:NONAME     "down"   NA       NA      
RFC38:NONAME    "down"   NA       NA      
DP2:NONAME      "down"   NA       NA      
RFC37:NONAME    "down"   NA       NA      
GADD45:NONAME   "down"   NA       NA      
CTTN:Q14247     NA       "up"     "up"    
GTSE1:Q9NYZ3    NA       "up"     "up"    
CHST11:Q9N      NA       "up"     "up"    
CHST11:PF2      NA       "up"     "up"    

等。 (会有更多行)

2 个答案:

答案 0 :(得分:4)

来自ldply包的

plyr对于此类任务特别有用。来自doc:

  

当.fun返回数据框时,会实现最明确的行为 - 在这种情况下,片段将与rbind.fill.组合*

其中rbind.fill是这个方便的函数绑定data.frames并用NA填充缺失的数据。

所以这里的技巧是应用一个函数将列表元素转换为data.frame:

my.list <- list()
my.list[["Zhou_et_al_2004"]]["CDC42:P60953"] <- 1
my.list[["Zhou_et_al_2004"]]["CDK2D:NONAME"] <- 2
my.list[["Zhou_et_al_2006"]]["CTTN:Q14247"]  <- 3
my.list[["Zhou_et_al_2006"]]["GTSE1:Q9NYZ3"] <- 4
my.list[["Zhou_et_al_2006"]]["CHST11:Q9N"]   <- 5

library(plyr)
ldply(my.list, .fun = function(x)as.data.frame(as.list(x)))
#               .id CDC42.P60953 CDK2D.NONAME CTTN.Q14247 GTSE1.Q9NYZ3 CHST11.Q9N
# 1 Zhou_et_al_2004            1            2          NA           NA         NA
# 2 Zhou_et_al_2006           NA           NA           3            4          5

我相信你会知道如何将其转换为最终格式。

答案 1 :(得分:2)

从@ flodel的样本数据开始,

my.list <- list()
my.list[["Zhou_et_al_2004"]]["CDC42:P60953"] <- 1
my.list[["Zhou_et_al_2004"]]["CDK2D:NONAME"] <- 2
my.list[["Zhou_et_al_2006"]]["CTTN:Q14247"]  <- 3
my.list[["Zhou_et_al_2006"]]["GTSE1:Q9NYZ3"] <- 4
my.list[["Zhou_et_al_2006"]]["CHST11:Q9N"]   <- 5
my.list[["Zhou_et_al_2009"]]["CTTN:Q14247"]  <- 6

将列表中的每个元素都放入数据框中,

a <- lapply(seq_along(my.list), function(i) {
  x <- my.list[[i]]
  out <- data.frame(name=names(x), out=x)
  names(out)[2] <- names(my.list)[[i]]
  out
})

将所有数据框合并在一起,

out <- Reduce(function(x,y) { merge(x, y, all=TRUE) }, a)

并修复rownames。

rownames(out) <- out[,1]
out <- out[,-1]

这是结果!

> out
             Zhou_et_al_2004 Zhou_et_al_2006 Zhou_et_al_2009
CDC42:P60953               1              NA              NA
CDK2D:NONAME               2              NA              NA
CHST11:Q9N                NA               5              NA
CTTN:Q14247               NA               3               6
GTSE1:Q9NYZ3              NA               4              NA