我有一个名为results的列表对象。此列表包含2个列表,每个列表也有多个列。我想将其转换为在每个列表中组合这些列的数据框。我知道我们不能组合不同长度的列,所以有没有办法让NA用于额外的观察。 这是列表对象的一小部分(结果)
results
[[1]]
gene_name x1 x2
gene34556 gene1 0 0
gene11169 gene2 0.098757012 0
gene11319 gene3 0 0
gene1459 gene3 0 0
gen168232 gene5 0 0
gene2992 gene6 -1.93960816 0.042291503
gene305454 gene7 0 0
gene3280 gene8 0 0
[[2]]
gene_name x1 x2
gene34556 gene1 0 0
gene11169 gene2 -3.785515694 0
gene11319 gene3 0 0
gene1459 gene4 0 0
gene2992 gene5 -2.308363477 -0.267514619
答案 0 :(得分:0)
由于您在两个列表中都有类似的基因名称,因此您不清楚如何将您的观察结果列入您的列表。但是,您可以通过以下几种方法将列表中的两个元素组合到一个数据框中:
library(data.table)
result <- list(data.frame(gene_name=c("a","b","c"),
x1 = rnorm(3),
x2 = rnorm(3),
row.names=c("gene34556","gene11169","gene11319"),
stringsAsFactors = F),
data.frame(gene_name=c("a","b","c", "x"),
x1 = rnorm(4),
x2 = rnorm(4),
row.names=c("gene34556","gene11169","gene11319","gene3280"),
stringsAsFactors = F))
# combine list "vertically"
rbindlist(result)
# gene_name x1 x2
# 1: a 0.3522310 -0.31057642
# 2: b -0.7110728 1.12948383
# 3: c -1.6032146 -0.87341353
# 4: a -0.1599496 -1.03543084
# 5: b -0.1081441 1.93735177
# 6: c 0.9923114 -0.02319378
# 7: x -0.8283895 0.72096001
# merge both dataframes within the list:
base:::merge(result[[1]], result[[2]], by="gene_name", all=TRUE)
# gene_name x1.x x2.x x1.y x2.y
# 1 a 0.3522310 -0.3105764 -0.1599496 -1.03543084
# 2 b -0.7110728 1.1294838 -0.1081441 1.93735177
# 3 c -1.6032146 -0.8734135 0.9923114 -0.02319378
# 4 x NA NA -0.8283895 0.72096001
如果列表中的数据帧需要根据rownames合并,则使用by = 0:
# merge both dataframes within the list:
base:::merge(result[[1]], result[[2]], by=0, all=TRUE)
# Row.names gene_name.x x1.x x2.x gene_name.y x1.y x2.y
# 1 gene11169 b -0.1694079 2.1168323 b 2.0969813 0.82247288
# 2 gene11319 c 1.5375766 -1.4373368 c 2.0990688 -0.06107935
# 3 gene3280 <NA> NA NA x 0.2528695 1.66448111
# 4 gene34556 a -0.5648451 -0.4891148 a -0.1783414 0.10531560
修改强>
如果列表中有多个数据框:
result <- list(data.frame(gene_name=c("a","b","c"),
x1 = rnorm(3),
x2 = rnorm(3),
row.names=c("gene34556","gene11169","gene11319"),
stringsAsFactors = F),
data.frame(gene_name=c("a","b","c", "x"),
x1 = rnorm(4),
x2 = rnorm(4),
row.names=c("gene34556","gene11169","gene11319","gene3280"),
stringsAsFactors = F),
data.frame(gene_name=c("a","c", "x"),
x1 = rnorm(3),
x2 = rnorm(3),
row.names=c("gene34556","gene11319","gene3280"),
stringsAsFactors = F))
# add rownames as a column
new.result <- lapply(result, FUN=function(x){y=cbind(row_name=rownames(x),x, stringsAsFactors=FALSE)})
# merge using base merge() function
new.result %>%
Reduce(function(df1,df2) merge(df1,df2, by='row_name', all=TRUE), .)
# The result is the data frame
row_name gene_name.x x1.x x2.x gene_name.y x1.y x2.y gene_name x1 x2
1 gene11169 b 0.80895379 0.02031943 b -0.3121325 0.7952539 <NA> NA NA
2 gene11319 c -1.20666887 1.05976176 c 0.4624013 -0.2617053 c 1.6058288 1.5488336
3 gene34556 a -0.01044742 0.11722414 a -0.2593305 1.2252805 a 0.8526598 0.2695985
4 gene3280 <NA> NA NA x 1.0222144 1.6846108 x -0.1128416 -0.4463099
# For large dataset full_join() from dplyr package might perform faster:
new.result %>%
Reduce(function(df1,df2) full_join(df1,df2, by='row_name'), .)
# row_name gene_name.x x1.x x2.x gene_name.y x1.y x2.y gene_name x1 x2
# 1 gene34556 a 0.8141012 -0.27145107 a -0.1113020 -0.1708712 a -0.4537174 -1.0222622
# 2 gene11169 b -0.2260749 0.09578933 b -1.7803083 -0.9246307 <NA> NA NA
# 3 gene11319 c 2.3439445 -1.11945962 c 0.3269329 -1.6452048 c -1.0486770 0.5048081
# 4 gene3280 <NA> NA NA x -1.7521306 0.7690779 x -1.3238697 0.4762742