按行不同列合并数据集

时间:2011-10-25 23:09:16

标签: r

我需要按行合并数据集,但它们具有不同的列。如何轻松地让R合并行,添加缺少的列并用NAs填充缺少的列?目前我会这样做(多次合并非常耗时):

创建假数据......

x1<-LETTERS[1:3]
x2<-letters[1:3]
x3<-rnorm(3)
x4<-rnorm(3)
x5<-rnorm(3)

具有一些相似列的多个data.frames的示例,一些不同......

data.frame(x1,x2,x3,x4,x5)
data.frame(x1,x3,x4,x5)
data.frame(x2,x3,x4,x5)
data.frame(x1,x2,x3,x4,x5)

我现在如何合并......

DF<-data.frame(rbind(data.frame(x1,x2,x3,x4,x5),
data.frame(x1,x2,x3,x4,x5),
data.frame("x2"=rep(NA,3),data.frame(x1,x3,x4,x5)),
data.frame("x1"=rep(NA,3),data.frame(x2,x3,x4,x5))))

DF

编辑: 我尝试了如下建议的代码:

l <- list(data.frame(x1,x2,x3,x4,x5),
          data.frame(x1,x3,x4,x5),
          data.frame(x2,x3,x4,x5),
          data.frame(x1,x2,x3,x4,x5))

merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE)) 
while (length(l) != 1) l<-merger(l) 

l

哪个收益率:

[[1]]
  x1       x3      x4        x5 x2
1  A  0.25492 0.30160  0.259287  a
2  B -0.25937 0.45936 -0.075415  b
3  C -0.53493 1.18316  0.627335  c

> DF
     x1   x2       x3      x4        x5
1     A    a  0.25492 0.30160  0.259287
2     B    b -0.25937 0.45936 -0.075415
3     C    c -0.53493 1.18316  0.627335
4     A    a  0.25492 0.30160  0.259287
5     B    b -0.25937 0.45936 -0.075415
6     C    c -0.53493 1.18316  0.627335
7     A <NA>  0.25492 0.30160  0.259287
8     B <NA> -0.25937 0.45936 -0.075415
9     C <NA> -0.53493 1.18316  0.627335
10 <NA>    a  0.25492 0.30160  0.259287
11 <NA>    b -0.25937 0.45936 -0.075415
12 <NA>    c -0.53493 1.18316  0.627335

编辑2:很抱歉延长原来的帖子,但我的低级代表不允许我回答我自己的问题。

结合Jaron和daroczig的反应会产生我想要的结果。我不想将每个数据框分配给一个对象,因此将它们组合为一个列表然后使用rbind fill非常合适(参见下面的代码)

谢谢你们两位!

x1<-LETTERS[1:3] 
x2<-letters[1:3] 
x3<-rnorm(3) 
x4<-rnorm(3) 
x5<-rnorm(3)

DFlist<-list(data.frame(x1,x2,x3,x4,x5), 
             data.frame(x1,x3,x4,x5),
             data.frame(x2,x3,x4,x5), 
             data.frame(x1,x2,x3,x4,x5))

rbind.fill(DFlist) 

3 个答案:

答案 0 :(得分:14)

在我理解你所寻找的内容之前,我必须阅读你的问题很多次,但也许你想rbind.fill来自plyr

d1 <- data.frame(x1,x2,x3,x4,x5)
d2 <- data.frame(x1,x3,x4,x5)
d3 <- data.frame(x2,x3,x4,x5)
d4 <- data.frame(x1,x2,x3,x4,x5)

> rbind.fill(d1,d4,d2,d3)
     x1   x2        x3         x4         x5
1     A    a 1.1216923  0.9236393  0.2749292
2     B    b 1.1913278  1.1145664 -0.5070576
3     C    c 0.2837657 -0.6631544 -1.0675885
4     A    a 1.1216923  0.9236393  0.2749292
5     B    b 1.1913278  1.1145664 -0.5070576
6     C    c 0.2837657 -0.6631544 -1.0675885
7     A <NA> 1.1216923  0.9236393  0.2749292
8     B <NA> 1.1913278  1.1145664 -0.5070576
9     C <NA> 0.2837657 -0.6631544 -1.0675885
10 <NA>    a 1.1216923  0.9236393  0.2749292
11 <NA>    b 1.1913278  1.1145664 -0.5070576
12 <NA>    c 0.2837657 -0.6631544 -1.0675885

答案 1 :(得分:2)

data.table::rbindlist fill = TRUE 选项一起使用:

data.table::rbindlist(
  list(data.frame(x1,x2,x3,x4,x5), 
       data.frame(x1,x3,x4,x5),
       data.frame(x2,x3,x4,x5), 
       data.frame(x1,x2,x3,x4,x5)),
  fill = TRUE)

答案 2 :(得分:1)

让我们假设您将数据框放在一个很好的列表中:

l <- list(
    data.frame(x2=rnorm(3),x1=rnorm(3)),
    data.frame(x1=rnorm(3),x2=rnorm(3),x3=rnorm(3),x4=rnorm(3),x5=rnorm(3)),
    data.frame(x5=rnorm(3),x2=rnorm(3),x3=rnorm(3),x4=rnorm(3),x1=rnorm(3)),
    data.frame(x5=rnorm(3),x2=rnorm(3),x3=rnorm(3),x4=rnorm(3)),
    data.frame(x2=rnorm(3),x1=rnorm(3),x3=rnorm(3),x4=rnorm(3))
)

抓住第一个并且(正如@joran所建议的)merge所有其余部分用例如。一个清晰的循环:

r <- l[[1]]
for (i in 2:length(l)) {
    r <- merge(r, l[[i]], all=TRUE)
}

结帐r

> r
         x2        x3       x4       x1        x5
1  -1.72436 -0.774652  3.10001  0.23249 -1.278216
2  -1.25640        NA       NA  0.32997        NA
3  -1.00652 -0.946254  1.17313       NA  2.014517
4  -0.53770 -0.466626 -0.63369 -1.48375 -1.135515
5  -0.49787        NA       NA -0.34020        NA
6  -0.49704 -0.054175  0.85477       NA  0.831706
7   0.13027  0.421750 -0.18126 -0.65452  0.476576
8   0.18519 -1.006994  0.15141  0.66808        NA
9   0.33954 -0.224478  1.38596       NA  0.145807
10  0.57782  1.126430 -0.89582  0.80199        NA
11  0.59149 -0.447669  0.74855 -1.65790  0.059767
12  0.61374  0.751528 -1.93715  0.40125 -0.148243
13  0.89399  0.758481 -0.94801  0.05084        NA
14  0.94200        NA       NA  0.24945        NA
15  0.99509  0.586097 -0.91455 -0.49909  0.823696

我不喜欢那个循环,所以写了一些递归的东西:

> merger <- function(l) lapply(2:length(l), function(x) merge(l[[x-1]], l[[x]], all=TRUE))
> while (length(l) != 1) l<-merger(l)
> l
[[1]]
         x2       x1        x3       x4        x5
1  -1.72436  0.23249 -0.774652  3.10001 -1.278216
2  -1.25640  0.32997        NA       NA        NA
3  -1.00652       NA -0.946254  1.17313  2.014517
4  -0.53770 -1.48375 -0.466626 -0.63369 -1.135515
5  -0.49787 -0.34020        NA       NA        NA
6  -0.49704       NA -0.054175  0.85477  0.831706
7   0.13027 -0.65452  0.421750 -0.18126  0.476576
8   0.18519  0.66808 -1.006994  0.15141        NA
9   0.33954       NA -0.224478  1.38596  0.145807
10  0.57782  0.80199  1.126430 -0.89582        NA
11  0.59149 -1.65790 -0.447669  0.74855  0.059767
12  0.61374  0.40125  0.751528 -1.93715 -0.148243
13  0.89399  0.05084  0.758481 -0.94801        NA
14  0.94200  0.24945        NA       NA        NA
15  0.99509 -0.49909  0.586097 -0.91455  0.823696