在事件发生前生成数字

时间:2016-07-15 03:37:22

标签: python r

我想在新列中生成数字,导致在另一列中发生事件。使用R或python可能是最直接的方法吗?

当前数据:

var1    var2    event
0.658   72.193  0
0.641   70.217  0
0.641   40.173  0
0.652   52.687  0
0.531   50.652  0
0.529   39.497  1
0.651   29.291  0
0.634   59.548  0
0.711   51.925  0
0.635   75.772  0
0.710   53.378  1
0.660   87.744  0
0.540   62.547  0
0.618   38.050  0
0.602   60.978  1

期望的输出:

var1    var2    event   event_lead
0.658   72.193  0         -5
0.641   70.217  0         -4
0.641   40.173  0         -3
0.652   52.687  0         -2
0.531   50.652  0         -1
0.529   39.497  1          0
0.651   29.291  0         -4
0.634   59.548  0         -3
0.711   51.925  0         -2
0.635   75.772  0         -1
0.710   53.378  1          0
0.660   87.744  0         -3
0.540   62.547  0         -2
0.618   38.050  0         -1
0.602   60.978  1          0

1 个答案:

答案 0 :(得分:2)

使用R,我们可以尝试使用data.table。我们创建一个分组变量(cumsum(event == 1)),根据该变量得到相反的序列,乘以-1并将其分配(:=)到事件_lead'。然后,我们将该输出与逻辑向量(!event)相乘,以便在事件中有1个事件'对于' event_lead'。

变为0
library(data.table)
setDT(df1)[, event_lead:=-(.N:1) ,cumsum(event == 1)
         ][, event_lead := event_lead* (!event)]
df1
#    var1   var2 event event_lead
# 1: 0.658 72.193     0         -5
# 2: 0.641 70.217     0         -4
# 3: 0.641 40.173     0         -3
# 4: 0.652 52.687     0         -2
# 5: 0.531 50.652     0         -1
# 6: 0.529 39.497     1          0
# 7: 0.651 29.291     0         -4
# 8: 0.634 59.548     0         -3
# 9: 0.711 51.925     0         -2
#10: 0.635 75.772     0         -1
#11: 0.710 53.378     1          0
#12: 0.660 87.744     0         -3
#13: 0.540 62.547     0         -2
#14: 0.618 38.050     0         -1
#15: 0.602 60.978     1          0

或者我们可以使用ave

中的base R
with(df1, ave(event, cumsum(event == 1), FUN = function(x)
                rev(seq_along(x) )* - 1) * (!event))
#[1] -5 -4 -3 -2 -1  0 -4 -3 -2 -1  0 -3 -2 -1  0

或者@thelatemail提到

with(df1, ave(event, rev(cumsum(rev(event))), 
           FUN=function(x) seq_along(x) - length(x)) )