我想在新列中生成数字,导致在另一列中发生事件。使用R或python可能是最直接的方法吗?
当前数据:
var1 var2 event
0.658 72.193 0
0.641 70.217 0
0.641 40.173 0
0.652 52.687 0
0.531 50.652 0
0.529 39.497 1
0.651 29.291 0
0.634 59.548 0
0.711 51.925 0
0.635 75.772 0
0.710 53.378 1
0.660 87.744 0
0.540 62.547 0
0.618 38.050 0
0.602 60.978 1
期望的输出:
var1 var2 event event_lead
0.658 72.193 0 -5
0.641 70.217 0 -4
0.641 40.173 0 -3
0.652 52.687 0 -2
0.531 50.652 0 -1
0.529 39.497 1 0
0.651 29.291 0 -4
0.634 59.548 0 -3
0.711 51.925 0 -2
0.635 75.772 0 -1
0.710 53.378 1 0
0.660 87.744 0 -3
0.540 62.547 0 -2
0.618 38.050 0 -1
0.602 60.978 1 0
答案 0 :(得分:2)
使用R
,我们可以尝试使用data.table
。我们创建一个分组变量(cumsum(event == 1)
),根据该变量得到相反的序列,乘以-1
并将其分配(:=
)到事件_lead'。然后,我们将该输出与逻辑向量(!event
)相乘,以便在事件中有1个事件'对于' event_lead'。
library(data.table)
setDT(df1)[, event_lead:=-(.N:1) ,cumsum(event == 1)
][, event_lead := event_lead* (!event)]
df1
# var1 var2 event event_lead
# 1: 0.658 72.193 0 -5
# 2: 0.641 70.217 0 -4
# 3: 0.641 40.173 0 -3
# 4: 0.652 52.687 0 -2
# 5: 0.531 50.652 0 -1
# 6: 0.529 39.497 1 0
# 7: 0.651 29.291 0 -4
# 8: 0.634 59.548 0 -3
# 9: 0.711 51.925 0 -2
#10: 0.635 75.772 0 -1
#11: 0.710 53.378 1 0
#12: 0.660 87.744 0 -3
#13: 0.540 62.547 0 -2
#14: 0.618 38.050 0 -1
#15: 0.602 60.978 1 0
或者我们可以使用ave
base R
with(df1, ave(event, cumsum(event == 1), FUN = function(x)
rev(seq_along(x) )* - 1) * (!event))
#[1] -5 -4 -3 -2 -1 0 -4 -3 -2 -1 0 -3 -2 -1 0
或者@thelatemail提到
with(df1, ave(event, rev(cumsum(rev(event))),
FUN=function(x) seq_along(x) - length(x)) )