我有以下data.table和感兴趣的变量x
。我想创建另一个变量来指示x
从0到1的跳转,这意味着该变量在某一年之前一直为0,之后的所有年份中为1。这应该由id_d
完成。
是否有一种简单的数据表格方式?
原始数据表:
fullDat <- data.table(id_d = rep(letters[1:3], each=12),
year=rep(1:12, 3),
x = c(rep(0, 5), rep(1, 7), 0,1,0,1,2,2,4, rep(5,5), 1, rep(0, 3), rep(1, 8)))
id_d year x
1: a 1 0
2: a 2 0
3: a 3 0
4: a 4 0
5: a 5 0
6: a 6 1
7: a 7 1
8: a 8 1
9: a 9 1
10: a 10 1
11: a 11 1
12: a 12 1
13: b 1 0
14: b 2 1
15: b 3 0
16: b 4 1
17: b 5 2
18: b 6 2
19: b 7 4
20: b 8 5
21: b 9 5
22: b 10 5
23: b 11 5
24: b 12 5
25: c 1 1
26: c 2 0
27: c 3 0
28: c 4 0
29: c 5 1
30: c 6 1
31: c 7 1
32: c 8 1
33: c 9 1
34: c 10 1
35: c 11 1
36: c 12 1
id_d year x
结果如何:
id_d year x jump
1: a 1 0 0
2: a 2 0 0
3: a 3 0 0
4: a 4 0 0
5: a 5 0 0
6: a 6 1 1
7: a 7 1 0
8: a 8 1 0
9: a 9 1 0
10: a 10 1 0
11: a 11 1 0
12: a 12 1 0
13: b 1 0 0
14: b 2 1 0
15: b 3 0 0
16: b 4 1 0
17: b 5 2 0
18: b 6 2 0
19: b 7 4 0
20: b 8 5 0
21: b 9 5 0
22: b 10 5 0
23: b 11 5 0
24: b 12 5 0
25: c 1 1 0
26: c 2 0 0
27: c 3 0 0
28: c 4 0 0
29: c 5 1 0
30: c 6 1 0
31: c 7 1 0
32: c 8 1 0
33: c 9 1 0
34: c 10 1 0
35: c 11 1 0
36: c 12 1 0
id_d year x jump
答案 0 :(得分:3)
该变量在某一年之前为0,在
之后的所有年份中为1
# find rows to assign one
wDT = fullDat[, .(year = year[with(rle(x),
if (identical(values, c(0, 1))) first(lengths) + 1L
else 0L
)]), by=id_d]
# initialize to zero
fullDat[, jump := 0L ]
# update join to assign ones
fullDat[wDT, on=.(id_d, year), jump := 1L ]
没有必要制作中间表wDT
;将完整代码写入最终语句也是有效的。事实上,如果需要,它可能都在一行中,就像...
DT[, x := 0L][code_for_wDT, on=on_cols, x := 1L]
或者,不要使用联接,只需使用.I
中的行号:
# find rows to assign one
w = fullDat[, with(rle(x), .I[
if (identical(values, c(0, 1))) first(lengths) + 1L
else 0L
]), by=id_d]$V1
# initialize to zero
fullDat[, jump := 0L ]
# update to assign ones
fullDat[w, jump := 1L ]
答案 1 :(得分:2)
我们可以做到
fullDat[, jump := {i1 <- which.max(x)
if(all(x[i1:.N]==1)) replace(rep(0, .N), i1, 1) else 0},
id_d]
fullDat
# id_d year x jump
# 1: a 1 0 0
# 2: a 2 0 0
# 3: a 3 0 0
# 4: a 4 0 0
# 5: a 5 0 0
# 6: a 6 1 1
# 7: a 7 1 0
# 8: a 8 1 0
# 9: a 9 1 0
#10: a 10 1 0
#11: a 11 1 0
#12: a 12 1 0
#13: b 1 0 0
#14: b 2 1 0
#15: b 3 0 0
#16: b 4 1 0
#17: b 5 2 0
#18: b 6 2 0
#19: b 7 4 0
#20: b 8 5 0
#21: b 9 5 0
#22: b 10 5 0
#23: b 11 5 0
#24: b 12 5 0
#25: c 1 1 0
#26: c 2 0 0
#27: c 3 0 0
#28: c 4 0 0
#29: c 5 1 0
#30: c 6 1 0
#31: c 7 1 0
#32: c 8 1 0
#33: c 9 1 0
#34: c 10 1 0
#35: c 11 1 0
#36: c 12 1 0
或者稍微紧凑的选项是
fullDat[, jump := if(all(cumsum(diff(x)) %in% c(0,1))) c(0, diff(x)) else 0 ,id_d]
答案 2 :(得分:0)
fullDat[, jump := (cumsum(x==0)==(1:.N - 1L)) & (rev(cumsum(rev(x==1))) == .N:1), id_d]
这是如何运作的:
cumsum(x==0) == (1:.N - 1L)
检查直到并包括此行的零的数量是否等于前一行的数量rev(cumsum(rev(x==1))) == .N:1
检查从最后一行向后(向下到此行)计数的1的数量等于从此处到结尾的行数