使用data.table识别所有事件发生的情况,如果按顺序

时间:2016-09-06 02:13:03

标签: r data.table cumsum

我正在尝试识别所有事件的发生,如果按顺序重复选择第一次出现。我可以标记并添加计数,但在事件发生变化后无法重置计数。

我的数据有~100行,有30个奇数ID。我只添加了一个ID,但在我的数据中有30个奇数ID。该表具有ID,日期时间和状态。

状态是可以有多个值的事件-A,B,C ......我关注的事件是B.

我想添加三列 -

Occurrence_B - 事件标志为B

Count_B - 在事件发生变化时,通过重置计算事件= B的连续出现次数

Include_B - 显示特定事件是第一次还是继续出现的标志

我会将数据分组到Include_B =' new'选择序列中的第一个匹配项。

ID  Date    Status  Occurrence_B    Count_B Include_B

A   7/28/15 12:00 AM    A   0   0   0

A   7/28/15 12:30 AM    A   0   0   0

A   7/30/15 12:00 AM    B   1   1   new

A   7/31/15 12:00 AM    B   1   2   continued

A   7/31/15 11:00 AM    B   1   3   continued

A   8/2/15 10:00 AM         B   0   0   0

A   8/3/15 12:00 AM         C   0   0   0

A   8/4/15 12:00 AM         B   1   1   new

A   8/5/15 12:00 AM         B   1   2   continued

A   8/6/15 12:00 AM         A   1   0   continued

A   8/7/15 12:00 AM         B   1   1   new

table_picture

我的示例代码 -

d1[, Occurrence_B:=Status %in% c('B')+0L]

d1[, Count_B := cumsum(Occurrence_B), by=.(ID,Status)]

问题是我不知道在事件发生变化后如何重置count_B。我正在尝试调查,但我是data.table的新手,所以非常感谢任何帮助。

如果您有任何疑问,请与我们联系。

SK

1 个答案:

答案 0 :(得分:2)

您可以尝试这样的事情:

# create Occurrence_B column and initialize Include_B as NA
(d1[, `:=` (Occurrence_B = as.integer(Status == "B"), Include_B = NA_character_)]

  # calculate Count_B use rleid(Occurrence_B) as group variable which will group consecutive
  # same values together
  [, Count_B := cumsum(Occurrence_B), by = rleid(Occurrence_B)]

  # Update the Include_B variable in place based on Count_B, when Count_B == 1, it appears 
  # the first time, when Count_B > 1, it is continued, otherwise keep them as NA
  [Count_B == 1, Include_B := "new"][Count_B > 1, Include_B := "continued"][])

# ID                Date Status Occurrence_B Count_B Include_B
# 1:  A 7/28/15 12:00 AM      A            0       0        NA
# 2:  A 7/28/15 12:30 AM      A            0       0        NA
# 3:  A 7/30/15 12:00 AM      B            1       1       new
# 4:  A 7/31/15 12:00 AM      B            1       2 continued
# 5:  A 7/31/15 11:00 AM      B            1       3 continued
# 6:  A  8/2/15 10:00 AM      B            1       4 continued
# 7:  A  8/3/15 12:00 AM      C            0       0        NA
# 8:  A  8/4/15 12:00 AM      B            1       1       new
# 9:  A  8/5/15 12:00 AM      B            1       2 continued
#10:  A  8/6/15 12:00 AM      A            0       0        NA
#11:  A  8/7/15 12:00 AM      B            1       1       new