rbindlist data.tables具有不同的列数

时间:2013-10-25 07:45:18

标签: r data.table

我想知道如何使用不同数量的列对数据表进行rbindlist,并使用像rbind.fill这样的NA填充空行

 DT1 <- data.table(A = 1:3)
 DT2 <- data.table(A  =4:5, B = letters[4:5])
 l <- list(DT1, DT2)
 rbindlist(l)
 #  Error in rbindlist(l) : 
 #   Item 2 has 2 columns, inconsistent with item 1 which has 1 columns

我想要的是

   A B
1: 1 NA
2: 2 NA
3: 3 NA
4: 4 d
5: 5 e

1 个答案:

答案 0 :(得分:7)

此功能现已在commit 1266 of v1.9.3中实施。来自NEWS

o  'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
   entirely in C. Closes #5249    
  -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
     names by default)
  -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
     is TRUE by default, for compatibility with base (and backwards compatibility).
  -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
  -> At least one item of the input list has to have non-null column names.
  -> Duplicate columns are bound in the order of occurrence, like base.
  -> Attributes that might exist in individual items would be lost in the bound result.
  -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
  -> And incredibly fast ;).
  -> Documentation updated in much detail. Closes DR #5158.

检查this post是否有基准。


示例:

1)使用fill的{​​{1}}参数:

rbindlist

请注意,DT1 <- data.table(x=1, y=2) DT2 <- data.table(y=2, z=-1) rbindlist(list(DT1, DT2), fill=TRUE) # x y z # 1: 1 2 NA # 2: NA 2 -1 时,fill=TRUE应为use.names


2)适当地绑定具有重复名称的表:

TRUE

3)它不仅限于DT1 <- data.table(x=1, x=2, y=1, y=2) DT2 <- data.table(y=3, y=-1, y=-2) rbindlist(list(DT1, DT2), fill=TRUE) # x x y y y # 1: 1 2 1 2 NA # 2: NA NA 3 -1 -2 ,还适用于data.tablesdata.frames

lists

4)如果您只想按名称绑定,可以只设置DT1 <- data.table(x=1, y=2) DT2 <- data.frame(y=2, z=-1) DT3 <- list(z=10) rbindlist(list(DT1,DT2,DT3), fill=TRUE) # x y z # 1: 1 2 NA # 2: NA 2 -1 # 3: NA NA 10 ,而不是use.names=TRUE

fill

5)默认情况与向后兼容性相同(DT1 <- data.table(x=1, y=2) DT2 <- data.table(y=1, x=2) rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE) # x y # 1: 1 2 # 2: 2 1 DT1 <- data.table(x=1, y=2) DT2 <- data.table(z=2, y=1) # returns error when fill=FALSE but can't be bound without fill=TRUE rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE) # Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : # Answer requires 3 columns whereas one or more item(s) in the input # list has only 2 columns. ... use.names=FALSE):

fill=FALSE

HTH