如何将包含多列的列表转换为R

时间:2018-05-15 18:04:49

标签: r

我有一个名为results的列表对象。此列表包含2个列表,每个列表也有多个列。我想将其转换为在每个列表中组合这些列的数据框。我知道我们不能组合不同长度的列,所以有没有办法让NA用于额外的观察。 这是列表对象的一小部分(结果)

results         
[[1]]           
         gene_name  x1                         x2
gene34556    gene1             0                0
gene11169    gene2   0.098757012                0
gene11319    gene3             0                0
gene1459     gene3             0                0
gen168232    gene5             0                0
gene2992     gene6   -1.93960816      0.042291503
gene305454   gene7             0                0
gene3280     gene8             0                0

[[2]]           
            gene_name          x1             x2
gene34556   gene1               0              0
gene11169   gene2    -3.785515694              0
gene11319   gene3               0              0
gene1459    gene4               0              0
gene2992    gene5    -2.308363477   -0.267514619

1 个答案:

答案 0 :(得分:0)

由于您在两个列表中都有类似的基因名称,因此您不清楚如何将您的观察结果列入您的列表。但是,您可以通过以下几种方法将列表中的两个元素组合到一个数据框中:

library(data.table)

result <- list(data.frame(gene_name=c("a","b","c"),
                          x1 = rnorm(3),
                          x2 = rnorm(3), 
                          row.names=c("gene34556","gene11169","gene11319"),
                          stringsAsFactors = F), 
               data.frame(gene_name=c("a","b","c", "x"),
                          x1 = rnorm(4),
                          x2 = rnorm(4),
                          row.names=c("gene34556","gene11169","gene11319","gene3280"),
                          stringsAsFactors = F))


# combine list "vertically"
rbindlist(result)
#    gene_name         x1          x2
# 1:         a  0.3522310 -0.31057642
# 2:         b -0.7110728  1.12948383
# 3:         c -1.6032146 -0.87341353
# 4:         a -0.1599496 -1.03543084
# 5:         b -0.1081441  1.93735177
# 6:         c  0.9923114 -0.02319378
# 7:         x -0.8283895  0.72096001

# merge both dataframes within the list:
base:::merge(result[[1]], result[[2]], by="gene_name", all=TRUE)
#   gene_name       x1.x       x2.x       x1.y        x2.y
# 1         a  0.3522310 -0.3105764 -0.1599496 -1.03543084
# 2         b -0.7110728  1.1294838 -0.1081441  1.93735177
# 3         c -1.6032146 -0.8734135  0.9923114 -0.02319378
# 4         x         NA         NA -0.8283895  0.72096001

如果列表中的数据帧需要根据rownames合并,则使用by = 0:

# merge both dataframes within the list:
base:::merge(result[[1]], result[[2]], by=0, all=TRUE)
#   Row.names gene_name.x       x1.x       x2.x gene_name.y       x1.y        x2.y
# 1 gene11169           b -0.1694079  2.1168323           b  2.0969813  0.82247288
# 2 gene11319           c  1.5375766 -1.4373368           c  2.0990688 -0.06107935
# 3  gene3280        <NA>         NA         NA           x  0.2528695  1.66448111
# 4 gene34556           a -0.5648451 -0.4891148           a -0.1783414  0.10531560

修改

如果列表中有多个数据框:

result <- list(data.frame(gene_name=c("a","b","c"),
                          x1 = rnorm(3),
                          x2 = rnorm(3), 
                          row.names=c("gene34556","gene11169","gene11319"),
                          stringsAsFactors = F), 
               data.frame(gene_name=c("a","b","c", "x"),
                          x1 = rnorm(4),
                          x2 = rnorm(4),
                          row.names=c("gene34556","gene11169","gene11319","gene3280"),
                          stringsAsFactors = F), 
               data.frame(gene_name=c("a","c", "x"),
                          x1 = rnorm(3),
                          x2 = rnorm(3),
                          row.names=c("gene34556","gene11319","gene3280"),
                          stringsAsFactors = F))

# add rownames as a column
new.result <- lapply(result, FUN=function(x){y=cbind(row_name=rownames(x),x, stringsAsFactors=FALSE)})

# merge using base merge() function 
new.result %>%
  Reduce(function(df1,df2) merge(df1,df2, by='row_name', all=TRUE), .)

# The result is the data frame
   row_name gene_name.x        x1.x       x2.x gene_name.y       x1.y       x2.y gene_name         x1         x2
1 gene11169           b  0.80895379 0.02031943           b -0.3121325  0.7952539      <NA>         NA         NA
2 gene11319           c -1.20666887 1.05976176           c  0.4624013 -0.2617053         c  1.6058288  1.5488336
3 gene34556           a -0.01044742 0.11722414           a -0.2593305  1.2252805         a  0.8526598  0.2695985
4  gene3280        <NA>          NA         NA           x  1.0222144  1.6846108         x -0.1128416 -0.4463099

# For large dataset full_join() from dplyr package might perform faster:
new.result %>%
  Reduce(function(df1,df2) full_join(df1,df2, by='row_name'), .)
#    row_name gene_name.x       x1.x        x2.x gene_name.y       x1.y       x2.y gene_name         x1         x2
# 1 gene34556           a  0.8141012 -0.27145107           a -0.1113020 -0.1708712         a -0.4537174 -1.0222622
# 2 gene11169           b -0.2260749  0.09578933           b -1.7803083 -0.9246307      <NA>         NA         NA
# 3 gene11319           c  2.3439445 -1.11945962           c  0.3269329 -1.6452048         c -1.0486770  0.5048081
# 4  gene3280        <NA>         NA          NA           x -1.7521306  0.7690779         x -1.3238697  0.4762742