如果我使用data.table中的by
关键字进行分组,则始终会将by
列作为第一列返回。是否有标志/选项告诉它不要这样做?还是一种巧妙的摆脱它的方式?
特别是我想将rbindlist
分组到原始表格中,所以问题也可以说是 - “如何阻止它重新排序列”
例如:
DT = data.table(I = as.numeric(1:6), N = rnorm(6), L = rep(c("a", "b", "c"), 2))
DT[, list(I = mean(I), N = mean(N)), by= L]
DT
给出:
> DT[, list(I = mean(I), N = mean(N)), by= L]
L I N
1: a 2.5 0.4291802
2: b 3.5 0.6669517
3: c 4.5 -0.6471886
> DT
I N L
1: 1 1.8460998 a
2: 2 0.7093438 b
3: 3 -1.7991193 c
4: 4 -0.9877394 a
5: 5 0.6245596 b
6: 6 0.5047421 c
就rbindlist
请求而言,能够做到这一点会很高兴:
DT = rbindlist(list(DT, DT[, list(I = mean(I), N = mean(N)), by= L]))
或者
DT = rbindlist(list(DT, DT[, list(I = mean(I), N = mean(N), L), by= L]))
或类似的东西(两者都不起作用)
答案 0 :(得分:4)
我也不特别喜欢这种自动列重新排序。我通常做的“技巧”是在获得输出后使用setcolorder
,如下所示:
DT <- data.table(I = 1:6, N = rnorm(6), L = rep(c("a", "b", "c"), 2))
DT.out <- DT[, list(I = mean(I), N = mean(N)), by= L]
此处,setcolorder
为:
setcolorder(DT.out, names(DT))
# I N L
# 1: 2.5 0.772719306 a
# 2: 3.5 -0.008921738 b
# 3: 4.5 -0.770807996 c
当然,如果DT
的名称与DT.out
相同,则此方法有效。否则,您必须明确指定列顺序:
setcolorder(DT.out, c("I", "N", "L"))
编辑:由于您希望立即按行绑定它们,是的,不将此作为中间结果会很好。由于rbindlist
似乎按位置绑定,因此您可以使用按列名绑定的rbind
,并data.table
将此值作为警告,如果您愿意,建议使用use.names=F
而是按位置绑定。您可以放心地忽略此警告。
dt1 <- data.table(x=1:5, y=6:10)
dt2 <- data.table(y=1:5, x=6:10)
rbind(dt1, dt2) # or do.call(rbind, list(dt1, dt2))
# x y
# 1: 1 6
# 2: 2 7
# 3: 3 8
# 4: 4 9
# 5: 5 10
# 6: 6 1
# 7: 7 2
# 8: 8 3
# 9: 9 4
# 10: 10 5
# Warning message:
# In .rbind.data.table(...) :
# Argument 2 has names in a different order. Columns will be bound by name for
# consistency with base. Alternatively, you can drop names (by using an unnamed
# list) and the columns will then be joined by position. Or, set use.names=FALSE.