我有数据
dat1 <- data.table(id=1:9,
group=c(1,1,2,2,2,3,3,3,3),
t=c(14,17,20,21,26,89,90,95,99),
index=c(1,2,1,2,3,1,2,3,4)
)
我希望根据t
计算index
与之前值之间的差异。对于每个组的第一个实例,我想计算一些外部变量的差异
dat2 <- data.table(group=c(1,2,3),
start=c(10,15,80)
)
应该获得以下结果:
> res
id group t index dif
1: 1 1 14 1 4
2: 2 1 17 2 3
3: 3 2 20 1 5
4: 4 2 21 2 1
5: 5 2 26 3 5
6: 6 3 89 1 9
7: 7 3 90 2 1
8: 8 3 95 3 5
9: 9 3 99 4 4
我尝试过使用
dat1[ , ifelse(index == min(index), dif := t - dat2$start, dif := t - t[-1]), by = group]
但我不确定在一步中引用同一组和外部元素的其他元素。这是否可以使用data.table?
答案 0 :(得分:4)
可能的解决方案:
dat1[, dif := ifelse(index == min(index),
t - dat2$start[match(.BY, dat2$group)],
t - shift(t))
, by = group][]
给出:
id group t index dif 1: 1 1 14 1 4 2: 2 1 17 2 3 3: 3 2 20 1 5 4: 4 2 21 2 1 5: 5 2 26 3 5 6: 6 3 89 1 9 7: 7 3 90 2 1 8: 8 3 95 3 5 9: 9 3 99 4 4
@jogo在评论中提出的避免ifelse的变体:
dat1[, dif := t - shift(t), by = group
][index == 1, dif := t - dat2[group==.BY, start], by = group][]
答案 1 :(得分:3)
我会尽量避免使用ifelse
并使用data.tables高效的join-capabilities:
dat1[dat2, on = "group", # join on group
start := i.start][, # add start value
diff := diff(c(start[1L], t)), by = group][, # compute difference
start := NULL] # remove start value
结果表是:
# id group t index diff
#1: 1 1 14 1 4
#2: 2 1 17 2 3
#3: 3 2 20 1 5
#4: 4 2 21 2 1
#5: 5 2 26 3 5
#6: 6 3 89 1 9
#7: 7 3 90 2 1
#8: 8 3 95 3 5
#9: 9 3 99 4 4
答案 2 :(得分:3)
您可以将shift
与动态fill
参数一起使用:Index&#39; dat2&#39;与.BY
一起开始&#39;每个&#39;:
dat1[ , dif := t - shift(t, fill = dat2[group == .BY, start]), by = group]
# id group t index dif
# 1: 1 1 14 1 4
# 2: 2 1 17 2 3
# 3: 3 2 20 1 5
# 4: 4 2 21 2 1
# 5: 5 2 26 3 5
# 6: 6 3 89 1 9
# 7: 7 3 90 2 1
# 8: 8 3 95 3 5
# 9: 9 3 99 4 4
或者,您可以分步执行此操作。可能是品味问题,但我发现它比ifelse
方式更透明。
首先是正常的&#39; shift
。然后添加一个&#39;索引&#39;变量到&#39; dat2&#39;并进行更新加入。
dat1[ , dif := t - shift(t), by = group]
dat2[ , index := 1]
dat1[dat2, on = .(group, index), dif := t - start]