我刚发现这个错误,却发现有些人称之为"feature"。这使得rbindlist
不像do.call("rbind",l)
那样rbind
会尊重列名。此外,文档中没有提到这种完全出乎意料的行为。这真的是故意的吗?
代码示例:
> library(data.table)
> DT1 <- data.table(a=1, b=2)
> DT2 <- data.table(b=3, a=4)
> DT1
a b
1: 1 2
> DT2
b a
1: 3 4
我希望rbind
这些会产生a = 1,4的列; b = 2,3。使用rbind.data.table
和rbind.data.frame
即可获得该结果,但rbind.data.table
会产生警告。
> rbind(DT1, DT2)
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
> rbind(as.data.frame(DT1), as.data.frame(DT2))
a b
1 1 2
2 4 3
> do.call('rbind', list(DT1, DT2))
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
然而, rbindlist
很乐意默默地破坏数据:
> rbindlist(list(DT1, DT2))
a b
1: 1 2
2: 3 4
答案 0 :(得分:6)
o 'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented
entirely in C. Closes #5249
-> use.names by default is FALSE for backwards compatibility (doesn't bind by
names by default)
-> rbind(...) now just calls rbindlist() internally, except that 'use.names'
is TRUE by default, for compatibility with base (and backwards compatibility).
-> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
-> At least one item of the input list has to have non-null column names.
-> Duplicate columns are bound in the order of occurrence, like base.
-> Attributes that might exist in individual items would be lost in the bound result.
-> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
-> And incredibly fast ;).
-> Documentation updated in much detail. Closes DR #5158.
通过这种方式,您可以将use.names=TRUE
设置为按名称绑定。默认情况下,它设置为FALSE
以实现向后兼容性。或者,您可以使用rbind(..)
其中use.names=TRUE
,以便向后兼容。
有关更多示例,请参阅this post;有关基准,请参见this post。
1)只需设置use.names=TRUE
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)
rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
# x y
# 1: 1 2
# 2: 2 1
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(z=2, y=1)
# returns error when fill=FALSE but can't be bound without fill=TRUE
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) :
# Answer requires 3 columns whereas one or more item(s) in the input
# list has only 2 columns. ...
2)还按发生顺序绑定重复的列名称:
DT1 <- data.table(x=1, x=2, y=10, y=20, y=30)
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30)
rbindlist(list(DT1,DT2), use.names=TRUE)
# x x y y y
# 1: 1 2 10 20 30
# 2: -2 -1 -10 -20 -30
3)如果要按名称绑定并填充缺少的列
,请使用fill=TRUE
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=2, z=-1)
rbindlist(list(DT1, DT2), fill=TRUE)
# x y z
# 1: 1 2 NA
# 2: NA 2 -1
HTH