我有一个数据框列表,其中包含我想要合并的不同行数。有一个可爱的solution用于合并我使用和工作的多个数据帧:
> go.sigtop.l[c(1:3)]
$SRSF1_cyto
GoTerm PValue Fold.Enrichment
1 lipid kinase activity 0.0044501957 5.378668
2 general RNA polymerase II transcription factor activity 0.0070975052 4.840801
3 protein methyltransferase activity 0.0022675162 4.302935
4 N-methyltransferase activity 0.0089131138 3.850638
5 structure-specific DNA binding 0.0002666942 3.821685
6 purine NTP-dependent helicase activity 0.0007861753 3.377303
$SRSF1_total
GoTerm PValue Fold.Enrichment
1 translation factor activity, nucleic acid binding 1.460691e-04 6.953428
2 structural constituent of ribosome 8.530549e-03 3.948718
3 RNA binding 3.479534e-09 3.675900
4 nucleotide binding 9.800564e-04 1.638817
$SRSF2_cyto
GoTerm PValue Fold.Enrichment
1 protein-lysine N-methyltransferase activity 0.001722436 16.486352
2 lysine N-methyltransferase activity 0.001722436 16.486352
3 histone-lysine N-methyltransferase activity 0.001722436 16.486352
4 histone methyltransferase activity 0.003756630 12.607211
5 N-methyltransferase activity 0.007775608 9.741935
6 protein methyltransferase activity 0.008275521 9.525448
> merge.all <- function(by, ...) {
+ frames <- list(...)
+ df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
+ names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
+
+ return(df)
+ }
> go.df <- merge.all("GoTerm", go.sigtop.l[[1]], go.sigtop.l[[2]], go.sigtop.l[[3]])
> go.df
GoTerm V1 V2 V3 NA NA NA
1 general RNA polymerase II transcription factor activity 0.0070975052 4.840801 NA NA NA NA
2 histone-lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
3 histone methyltransferase activity NA NA NA NA 0.003756630 12.607211
4 lipid kinase activity 0.0044501957 5.378668 NA NA NA NA
5 lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
6 N-methyltransferase activity 0.0089131138 3.850638 NA NA 0.007775608 9.741935
7 nucleotide binding NA NA 9.800564e-04 1.638817 NA NA
8 protein-lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
9 protein methyltransferase activity 0.0022675162 4.302935 NA NA 0.008275521 9.525448
10 purine NTP-dependent helicase activity 0.0007861753 3.377303 NA NA NA NA
11 RNA binding NA NA 3.479534e-09 3.675900 NA NA
12 structural constituent of ribosome NA NA 8.530549e-03 3.948718 NA NA
13 structure-specific DNA binding 0.0002666942 3.821685 NA NA NA NA
14 translation factor activity, nucleic acid binding NA NA 1.460691e-04 6.953428 NA NA
但问题是列表中的数据帧数量会有所不同。如何自动调用所有元素而不考虑列表中包含的数字?我试过了:
merge.all("GoTerm", go.sigtop.l[c(1:length(names(go.sigtop.l)))])
但这没效果。
我知道类似问题的许多答案,但不是我见过的那些解决了我的问题。欢呼声。
答案 0 :(得分:1)
这不是很好,但可以使用for循环完成。如果有更好的解决方案,我会接受它而不是:
df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0
> head(df.m)
GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1 aminoacyl-tRNA ligase activity 0.000000000 0.000000 0 0 0
2 beta-catenin binding 0.000000000 0.000000 0 0 0
3 cell adhesion molecule binding 0.000000000 0.000000 0 0 0
4 cytochrome-c oxidase activity 0.000000000 0.000000 0 0 0
5 cytoskeletal protein binding 0.000000000 0.000000 0 0 0
6 general RNA polymerase II transcription factor activity 0.007097505 4.840801 0 0 0
Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1 0 0 0 0.000000000 0.000000 0 0
2 0 0 0 0.000186408 5.037574 0 0
3 0 0 0 0.000000000 0.000000 0 0
4 0 0 0 0.000000000 0.000000 0 0
5 0 0 0 0.000000000 0.000000 0 0
6 0 0 0 0.000000000 0.000000 0 0
PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1 0.0000000 0.00000 0.0000000 0.000000 0 0 0
2 0.0000000 0.00000 0.0000000 0.000000 0 0 0
3 0.0000000 0.00000 0.0000000 0.000000 0 0 0
4 0.0025874 14.26516 0.0000000 0.000000 0 0 0
5 0.0000000 0.00000 0.0053485 4.239176 0 0 0
6 0.0000000 0.00000 0.0000000 0.000000 0 0 0
Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1 0 0.0007474458 12.03623 0 0 0 0
2 0 0.0000000000 0.00000 0 0 0 0
3 0 0.0000000000 0.00000 0 0 0 0
4 0 0.0000000000 0.00000 0 0 0 0
5 0 0.0000000000 0.00000 0 0 0 0
6 0 0.0000000000 0.00000 0 0 0 0
PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1 0.000000000 0.00000
2 0.000000000 0.00000
3 0.009078473 20.42213
4 0.000000000 0.00000
5 0.000000000 0.00000
6 0.000000000 0.00000
答案 1 :(得分:0)
您是否尝试过此功能:
http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html
请参阅链接中的以下代码:
df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2
out <- smartbind( list(df1, df2, df3, df4))
在我们的案例中
out <- smartbind(go.sigtop.l)
已编辑回复。