当使用`get()`时,j参数中`.SD`的列顺序不同

时间:2017-02-20 16:02:45

标签: r data.table

我经常使用.SDcols中的data.table选项转换数据子集。 发送给.SD的{​​{1}}列与原始data.table 的顺序相同是有意义的。

已编辑以正确识别问题

j列的顺序与.SD参数中指定的顺序相同,这很好。在.SDcols参数中使用get时(至少在j调用内),不会。在这种情况下,lapply表列保持其原始顺序。

有没有办法覆盖这种行为?

没有.SD的示例正常

get

结果会将差异值分配给“错误”列,因为我的命名向量(# library(data.table) dt = data.table(col1 = rep(LETTERS[1:3], 4), b = rnorm(12), a = 1:12, c = LETTERS[1:12]) # columns I want to do something to d.vars = c('a', 'b') #' names in different order than names(dt) # Generate columns of first differences by group dt[, paste('d', d.vars, sep='.') := lapply(.SD, function(L) L - shift(L, n = 1, type='lag') ), keyby = col1, .SDcols = d.vars] )的排序方式与d.vars中的列不同。结果是:

结果与预期一致,dt表的列的排序方式与.SD中的名称相同。

d.vars

尽管> dt col1 b a c d.a d.b 1: A -0.28901751 1 A NA NA 2: A 0.65746901 4 D 3 0.94648651 3: A -0.10602462 7 G 3 -0.76349362 4: A -0.38406252 10 J 3 -0.27803790 5: B -1.06963450 2 B NA NA 6: B 0.35137273 5 E 3 1.42100723 7: B 0.43394046 8 H 3 0.08256772 8: B 0.82525042 11 K 3 0.39130996 9: C 0.50421710 3 C NA NA 10: C -1.09493665 6 F 3 -1.59915375 11: C -0.04858163 9 I 3 1.04635501 12: C 0.45867279 12 L 3 0.50725443 中的列顺序为lapply,但j处理后的列a中的b位于dtget秒,这是预期的输出。< / p>

dt2 = data.table(col1 = rep(LETTERS[1:3], 4), b = rnorm(12), a = 1:12, neg = -1, c = LETTERS[1:12]) # columns I want to do something to d.vars = c('a', 'b') #' names in different order than names(dt) # name of variable to be called in j. negate <- 'neg' dt2[, paste('d', d.vars, sep='.') := lapply(.SD, function(L) {(L - shift(L, n = 1, type='lag') ) * get(negate) }), keyby = col1, .SDcols = d.vars] 的示例表现不同

d.vars

现在,新创建的列的命名与 > dt2 col1 b a neg c d.a d.b 1: A -0.3539066 1 -1 A NA NA 2: A 0.2702374 4 -1 D -0.62414408 -3 3: A -0.7834941 7 -1 G 1.05373150 -3 4: A -1.2765652 10 -1 J 0.49307118 -3 5: B -0.2936422 2 -1 B NA NA 6: B -0.2451996 5 -1 E -0.04844252 -3 7: B -1.6577614 8 -1 H 1.41256181 -3 8: B 1.0668059 11 -1 K -2.72456737 -3 9: C -0.1160938 3 -1 C NA NA 10: C -0.7940771 6 -1 F 0.67798333 -3 11: C 0.2951743 9 -1 I -1.08925140 -3 12: C -0.4508854 12 -1 L 0.74605969 -3 中的名称顺序不一致:

b

在第二个示例中,lapply列首先由d.a处理,因此已分配给neg

如果我直接引用get(即我不使用lapply),则结果符合预期:.SD处理订单中的d.vars列在time.sleep(10)中给出。

P.S。谢谢data.table团队!我喜欢这个包裹!

1 个答案:

答案 0 :(得分:2)

根据说明,我们可以使用match来匹配&#39; d.vars&#39;以及&#39; dt&#39;的列名(&#39; d.vars1&#39;)然后使用它来获得正确的订单

d.vars1 <- d.vars[match(names(dt), d.vars, nomatch = 0)]
dt[, paste0("d.",d.vars1) := lapply(.SD, function(L)
        L  - shift(L, n = 1, type='lag') ), keyby = col1, .SDcols = d.vars1]
dt
#    col1           b  a c         d.b d.a
# 1:    A -0.28901751  1 A          NA  NA
# 2:    A  0.65746901  4 D  0.94648652   3
# 3:    A -0.10602462  7 G -0.76349363   3
# 4:    A -0.38406252 10 J -0.27803790   3
# 5:    B -1.06963450  2 B          NA  NA
# 6:    B  0.35137273  5 E  1.42100723   3
# 7:    B  0.43394046  8 H  0.08256773   3
# 8:    B  0.82525042 11 K  0.39130996   3
# 9:    C  0.50421710  3 C          NA  NA
#10:    C -1.09493665  6 F -1.59915375   3
#11:    C -0.04858163  9 I  1.04635502   3
#12:    C  0.45867279 12 L  0.50725442   3

更新

基于新数据集

d.vars1 <- d.vars[match(names(dt2), d.vars, nomatch = 0)]
dt2[, paste0('d.', d.vars1) := lapply(.SD, function(L) 
    L  - shift(L, n = 1, type='lag') * get(negate) ), 
            keyby = col1, .SDcols = d.vars1]
dt2
#    col1          b  a neg c        d.b d.a
# 1:    A -0.3539066  1  -1 A         NA  NA
# 2:    A  0.2702374  4  -1 D -0.0836692   5
# 3:    A -0.7834941  7  -1 G -0.5132567  11
# 4:    A -1.2765652 10  -1 J -2.0600593  17
# 5:    B -0.2936422  2  -1 B         NA  NA
# 6:    B -0.2451996  5  -1 E -0.5388418   7
# 7:    B -1.6577614  8  -1 H -1.9029610  13
# 8:    B  1.0668059 11  -1 K -0.5909555  19
# 9:    C -0.1160938  3  -1 C         NA  NA
#10:    C -0.7940771  6  -1 F -0.9101709   9
#11:    C  0.2951743  9  -1 I -0.4989028  15
#12:    C -0.4508854 12  -1 L -0.1557111  21