在另一个事件中的事件之后在新列中生成数字

时间:2016-07-19 03:31:25

标签: r

我曾问过this question并得到了以下解决方案,该解决方案非常适合在事件发生之前生成负数。

   library(data.table)
   setDT(df1)[, event_lead:=-(.N:1) ,cumsum(event == 1)
     ][, event_lead := event_lead* (!event)]
   df1
   #    var1   var2 event event_lead
   # 1: 0.658 72.193     0         -5
   # 2: 0.641 70.217     0         -4
   # 3: 0.641 40.173     0         -3
   # 4: 0.652 52.687     0         -2
   # 5: 0.531 50.652     0         -1
   # 6: 0.529 39.497     1          0
   # 7: 0.651 29.291     0         -4
   # 8: 0.634 59.548     0         -3
   # 9: 0.711 51.925     0         -2
   #10: 0.635 75.772     0         -1
   #11: 0.710 53.378     1          0
   #12: 0.660 87.744     0         -3
   #13: 0.540 62.547     0         -2
   #14: 0.618 38.050     0         -1
   #15: 0.602 60.978     1          0

现在我正在尝试调整此代码,以便在事件发生后获得另一列正数。

    > setDT(df1)[, event_lead:=-(.N:1) ,cumsum(event == 1)
    +            ][, event_lead := event_lead* (!event)]


    > setDT(df1)[, event_follow:=+(1:.N) ,cumsum(event == 1)
    +            ][, event_follow := event_follow* (!event)]

    > df1
         var1   var2 event event_lead event_follow
     1: 0.658 72.193     0         -5            1
     2: 0.641 70.217     0         -4            2
     3: 0.641 40.173     0         -3            3
     4: 0.652 52.687     0         -2            4
     5: 0.531 50.652     0         -1            5
     6: 0.529 39.497     1          0            0
     7: 0.651 29.291     0         -4            2
     8: 0.634 59.548     0         -3            3
     9: 0.711 51.925     0         -2            4
    10: 0.635 75.772     0         -1            5
    11: 0.710 53.378     1          0            0
    12: 0.660 87.744     0         -3            2
    13: 0.540 62.547     0         -2            3
    14: 0.618 38.050     0         -1            4
    15: 0.602 60.978     1          0            0

为什么它会在0 event_follow之后跳过1,我该如何解决?

1 个答案:

答案 0 :(得分:1)

我们可以通过逻辑向量lagevent ==1)创建一个分组变量并取cumsum,然后将行序列乘以'的逻辑向量。事件'并将(:=)分配给' event_follow'。

df1[, event_follow := seq_len(.N) * !event ,cumsum(shift(event ==1, fill = FALSE))]
df1
#     var1   var2 event event_lead event_follow
# 1: 0.658 72.193     0         -5            1
# 2: 0.641 70.217     0         -4            2
# 3: 0.641 40.173     0         -3            3
# 4: 0.652 52.687     0         -2            4
# 5: 0.531 50.652     0         -1            5
# 6: 0.529 39.497     1          0            0
# 7: 0.651 29.291     0         -4            1
# 8: 0.634 59.548     0         -3            2
# 9: 0.711 51.925     0         -2            3
#10: 0.635 75.772     0         -1            4
#11: 0.710 53.378     1          0            0
#12: 0.660 87.744     0         -3            1
#13: 0.540 62.547     0         -2            2
#14: 0.618 38.050     0         -1            3
#15: 0.602 60.978     1          0            0

在OP的帖子中,' event_follow'是通过使用'事件'的累积总和创建的。那就是1.所以当有一个1时,一个新组开始。如果我们检查输出

df1[, event_follow1 := +(1:.N) ,cumsum(event == 1)]
df1$event_follow1
#[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 1

which(df1$event ==1)
#[1]  6 11 15
第6,11和15是新序列开始的元素。当我们乘以新条件event_follow* (!event)时,即“事件”中有1的地方。在逻辑向量中为FALSE,因此所有这些元素都在' event_follow'变为0。