合并列表中的data.frames:如何选择多个元素

时间:2013-07-05 12:44:25

标签: r list merge

我有一个数据框列表,其中包含我想要合并的不同行数。有一个可爱的solution用于合并我使用和工作的多个数据帧:

> go.sigtop.l[c(1:3)]
$SRSF1_cyto
                                                   GoTerm       PValue Fold.Enrichment
1                                   lipid kinase activity 0.0044501957        5.378668
2 general RNA polymerase II transcription factor activity 0.0070975052        4.840801
3                      protein methyltransferase activity 0.0022675162        4.302935
4                            N-methyltransferase activity 0.0089131138        3.850638
5                          structure-specific DNA binding 0.0002666942        3.821685
6                  purine NTP-dependent helicase activity 0.0007861753        3.377303

$SRSF1_total
                                             GoTerm       PValue Fold.Enrichment
1 translation factor activity, nucleic acid binding 1.460691e-04        6.953428
2                structural constituent of ribosome 8.530549e-03        3.948718
3                                       RNA binding 3.479534e-09        3.675900
4                                nucleotide binding 9.800564e-04        1.638817

$SRSF2_cyto
                                       GoTerm      PValue Fold.Enrichment
1 protein-lysine N-methyltransferase activity 0.001722436       16.486352
2         lysine N-methyltransferase activity 0.001722436       16.486352
3 histone-lysine N-methyltransferase activity 0.001722436       16.486352
4          histone methyltransferase activity 0.003756630       12.607211
5                N-methyltransferase activity 0.007775608        9.741935
6          protein methyltransferase activity 0.008275521        9.525448

> merge.all <- function(by, ...) {
+   frames <- list(...)
+   df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
+   names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
+   
+   return(df)
+ }
> go.df <- merge.all("GoTerm", go.sigtop.l[[1]], go.sigtop.l[[2]], go.sigtop.l[[3]])
> go.df
                                                    GoTerm           V1       V2           V3       NA          NA        NA
1  general RNA polymerase II transcription factor activity 0.0070975052 4.840801           NA       NA          NA        NA
2              histone-lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
3                       histone methyltransferase activity           NA       NA           NA       NA 0.003756630 12.607211
4                                    lipid kinase activity 0.0044501957 5.378668           NA       NA          NA        NA
5                      lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
6                             N-methyltransferase activity 0.0089131138 3.850638           NA       NA 0.007775608  9.741935
7                                       nucleotide binding           NA       NA 9.800564e-04 1.638817          NA        NA
8              protein-lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
9                       protein methyltransferase activity 0.0022675162 4.302935           NA       NA 0.008275521  9.525448
10                  purine NTP-dependent helicase activity 0.0007861753 3.377303           NA       NA          NA        NA
11                                             RNA binding           NA       NA 3.479534e-09 3.675900          NA        NA
12                      structural constituent of ribosome           NA       NA 8.530549e-03 3.948718          NA        NA
13                          structure-specific DNA binding 0.0002666942 3.821685           NA       NA          NA        NA
14       translation factor activity, nucleic acid binding           NA       NA 1.460691e-04 6.953428          NA        NA

但问题是列表中的数据帧数量会有所不同。如何自动调用所有元素而不考虑列表中包含的数字?我试过了:

merge.all("GoTerm", go.sigtop.l[c(1:length(names(go.sigtop.l)))]) 

但这没效果。

我知道类似问题的许多答案,但不是我见过的那些解决了我的问题。欢呼声。

2 个答案:

答案 0 :(得分:1)

这不是很好,但可以使用for循环完成。如果有更好的解决方案,我会接受它而不是:

df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0 

> head(df.m)
                                                   GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1                          aminoacyl-tRNA ligase activity       0.000000000                   0.000000                  0                           0                 0
2                                    beta-catenin binding       0.000000000                   0.000000                  0                           0                 0
3                          cell adhesion molecule binding       0.000000000                   0.000000                  0                           0                 0
4                           cytochrome-c oxidase activity       0.000000000                   0.000000                  0                           0                 0
5                            cytoskeletal protein binding       0.000000000                   0.000000                  0                           0                 0
6 general RNA polymerase II transcription factor activity       0.007097505                   4.840801                  0                           0                 0
  Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1                          0                  0                           0       0.000000000                   0.000000                  0                           0
2                          0                  0                           0       0.000186408                   5.037574                  0                           0
3                          0                  0                           0       0.000000000                   0.000000                  0                           0
4                          0                  0                           0       0.000000000                   0.000000                  0                           0
5                          0                  0                           0       0.000000000                   0.000000                  0                           0
6                          0                  0                           0       0.000000000                   0.000000                  0                           0
  PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
2         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
3         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
4         0.0025874                   14.26516          0.0000000                    0.000000                 0                          0                  0
5         0.0000000                    0.00000          0.0053485                    4.239176                 0                          0                  0
6         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
  Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1                           0      0.0007474458                   12.03623                  0                           0                 0                          0
2                           0      0.0000000000                    0.00000                  0                           0                 0                          0
3                           0      0.0000000000                    0.00000                  0                           0                 0                          0
4                           0      0.0000000000                    0.00000                  0                           0                 0                          0
5                           0      0.0000000000                    0.00000                  0                           0                 0                          0
6                           0      0.0000000000                    0.00000                  0                           0                 0                          0
  PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1        0.000000000                     0.00000
2        0.000000000                     0.00000
3        0.009078473                    20.42213
4        0.000000000                     0.00000
5        0.000000000                     0.00000
6        0.000000000                     0.00000

答案 1 :(得分:0)

您是否尝试过此功能:

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

请参阅链接中的以下代码:

df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2

out <- smartbind( list(df1, df2, df3, df4))

在我们的案例中

out <- smartbind(go.sigtop.l)

已编辑回复。