我需要按行合并数据集,但它们具有不同的列。如何轻松地让R合并行,添加缺少的列并用NAs填充缺少的列?目前我会这样做(多次合并非常耗时):
创建假数据......
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
具有一些相似列的多个data.frames的示例,一些不同......
data.frame(x1,x2,x3,x4,x5)
data.frame(x1,x3,x4,x5)
data.frame(x2,x3,x4,x5)
data.frame(x1,x2,x3,x4,x5)
我现在如何合并......
DF<-data.frame(rbind(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5),
data.frame("x2"=rep(NA,3),data.frame(x1,x3,x4,x5)),
data.frame("x1"=rep(NA,3),data.frame(x2,x3,x4,x5))))
DF
编辑: 我尝试了如下建议的代码:
l <- list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE))
while (length(l) != 1) l<-merger(l)
l
哪个收益率:
[[1]]
x1 x3 x4 x5 x2
1 A 0.25492 0.30160 0.259287 a
2 B -0.25937 0.45936 -0.075415 b
3 C -0.53493 1.18316 0.627335 c
不
> DF
x1 x2 x3 x4 x5
1 A a 0.25492 0.30160 0.259287
2 B b -0.25937 0.45936 -0.075415
3 C c -0.53493 1.18316 0.627335
4 A a 0.25492 0.30160 0.259287
5 B b -0.25937 0.45936 -0.075415
6 C c -0.53493 1.18316 0.627335
7 A <NA> 0.25492 0.30160 0.259287
8 B <NA> -0.25937 0.45936 -0.075415
9 C <NA> -0.53493 1.18316 0.627335
10 <NA> a 0.25492 0.30160 0.259287
11 <NA> b -0.25937 0.45936 -0.075415
12 <NA> c -0.53493 1.18316 0.627335
编辑2:很抱歉延长原来的帖子,但我的低级代表不允许我回答我自己的问题。
结合Jaron和daroczig的反应会产生我想要的结果。我不想将每个数据框分配给一个对象,因此将它们组合为一个列表然后使用rbind fill非常合适(参见下面的代码)
谢谢你们两位!
x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)
DFlist<-list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5))
rbind.fill(DFlist)
答案 0 :(得分:14)
在我理解你所寻找的内容之前,我必须阅读你的问题很多次,但也许你想rbind.fill
来自plyr
:
d1 <- data.frame(x1,x2,x3,x4,x5)
d2 <- data.frame(x1,x3,x4,x5)
d3 <- data.frame(x2,x3,x4,x5)
d4 <- data.frame(x1,x2,x3,x4,x5)
> rbind.fill(d1,d4,d2,d3)
x1 x2 x3 x4 x5
1 A a 1.1216923 0.9236393 0.2749292
2 B b 1.1913278 1.1145664 -0.5070576
3 C c 0.2837657 -0.6631544 -1.0675885
4 A a 1.1216923 0.9236393 0.2749292
5 B b 1.1913278 1.1145664 -0.5070576
6 C c 0.2837657 -0.6631544 -1.0675885
7 A <NA> 1.1216923 0.9236393 0.2749292
8 B <NA> 1.1913278 1.1145664 -0.5070576
9 C <NA> 0.2837657 -0.6631544 -1.0675885
10 <NA> a 1.1216923 0.9236393 0.2749292
11 <NA> b 1.1913278 1.1145664 -0.5070576
12 <NA> c 0.2837657 -0.6631544 -1.0675885
答案 1 :(得分:2)
将data.table::rbindlist与 fill = TRUE 选项一起使用:
data.table::rbindlist(
list(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x3,x4,x5),
data.frame(x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5)),
fill = TRUE)
答案 2 :(得分:1)
让我们假设您将数据框放在一个很好的列表中:
l <- list(
data.frame(x2=rnorm(3),x1=rnorm(3)),
data.frame(x1=rnorm(3),x2=rnorm(3),x3=rnorm(3),x4=rnorm(3),x5=rnorm(3)),
data.frame(x5=rnorm(3),x2=rnorm(3),x3=rnorm(3),x4=rnorm(3),x1=rnorm(3)),
data.frame(x5=rnorm(3),x2=rnorm(3),x3=rnorm(3),x4=rnorm(3)),
data.frame(x2=rnorm(3),x1=rnorm(3),x3=rnorm(3),x4=rnorm(3))
)
抓住第一个并且(正如@joran所建议的)merge
所有其余部分用例如。一个清晰的循环:
r <- l[[1]]
for (i in 2:length(l)) {
r <- merge(r, l[[i]], all=TRUE)
}
结帐r
:
> r
x2 x3 x4 x1 x5
1 -1.72436 -0.774652 3.10001 0.23249 -1.278216
2 -1.25640 NA NA 0.32997 NA
3 -1.00652 -0.946254 1.17313 NA 2.014517
4 -0.53770 -0.466626 -0.63369 -1.48375 -1.135515
5 -0.49787 NA NA -0.34020 NA
6 -0.49704 -0.054175 0.85477 NA 0.831706
7 0.13027 0.421750 -0.18126 -0.65452 0.476576
8 0.18519 -1.006994 0.15141 0.66808 NA
9 0.33954 -0.224478 1.38596 NA 0.145807
10 0.57782 1.126430 -0.89582 0.80199 NA
11 0.59149 -0.447669 0.74855 -1.65790 0.059767
12 0.61374 0.751528 -1.93715 0.40125 -0.148243
13 0.89399 0.758481 -0.94801 0.05084 NA
14 0.94200 NA NA 0.24945 NA
15 0.99509 0.586097 -0.91455 -0.49909 0.823696
我不喜欢那个循环,所以写了一些递归的东西:
> merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE))
> while (length(l) != 1) l<-merger(l)
> l
[[1]]
x2 x1 x3 x4 x5
1 -1.72436 0.23249 -0.774652 3.10001 -1.278216
2 -1.25640 0.32997 NA NA NA
3 -1.00652 NA -0.946254 1.17313 2.014517
4 -0.53770 -1.48375 -0.466626 -0.63369 -1.135515
5 -0.49787 -0.34020 NA NA NA
6 -0.49704 NA -0.054175 0.85477 0.831706
7 0.13027 -0.65452 0.421750 -0.18126 0.476576
8 0.18519 0.66808 -1.006994 0.15141 NA
9 0.33954 NA -0.224478 1.38596 0.145807
10 0.57782 0.80199 1.126430 -0.89582 NA
11 0.59149 -1.65790 -0.447669 0.74855 0.059767
12 0.61374 0.40125 0.751528 -1.93715 -0.148243
13 0.89399 0.05084 0.758481 -0.94801 NA
14 0.94200 0.24945 NA NA NA
15 0.99509 -0.49909 0.586097 -0.91455 0.823696