我正在尝试识别所有事件的发生,如果按顺序重复选择第一次出现。我可以标记并添加计数,但在事件发生变化后无法重置计数。
我的数据有~100行,有30个奇数ID。我只添加了一个ID,但在我的数据中有30个奇数ID。该表具有ID,日期时间和状态。
状态是可以有多个值的事件-A,B,C ......我关注的事件是B.
我想添加三列 -
Occurrence_B - 事件标志为B
Count_B - 在事件发生变化时,通过重置计算事件= B的连续出现次数
Include_B - 显示特定事件是第一次还是继续出现的标志
我会将数据分组到Include_B =' new'选择序列中的第一个匹配项。
ID Date Status Occurrence_B Count_B Include_B
A 7/28/15 12:00 AM A 0 0 0
A 7/28/15 12:30 AM A 0 0 0
A 7/30/15 12:00 AM B 1 1 new
A 7/31/15 12:00 AM B 1 2 continued
A 7/31/15 11:00 AM B 1 3 continued
A 8/2/15 10:00 AM B 0 0 0
A 8/3/15 12:00 AM C 0 0 0
A 8/4/15 12:00 AM B 1 1 new
A 8/5/15 12:00 AM B 1 2 continued
A 8/6/15 12:00 AM A 1 0 continued
A 8/7/15 12:00 AM B 1 1 new
我的示例代码 -
d1[, Occurrence_B:=Status %in% c('B')+0L]
d1[, Count_B := cumsum(Occurrence_B), by=.(ID,Status)]
问题是我不知道在事件发生变化后如何重置count_B。我正在尝试调查,但我是data.table的新手,所以非常感谢任何帮助。
如果您有任何疑问,请与我们联系。
SK
答案 0 :(得分:2)
您可以尝试这样的事情:
# create Occurrence_B column and initialize Include_B as NA
(d1[, `:=` (Occurrence_B = as.integer(Status == "B"), Include_B = NA_character_)]
# calculate Count_B use rleid(Occurrence_B) as group variable which will group consecutive
# same values together
[, Count_B := cumsum(Occurrence_B), by = rleid(Occurrence_B)]
# Update the Include_B variable in place based on Count_B, when Count_B == 1, it appears
# the first time, when Count_B > 1, it is continued, otherwise keep them as NA
[Count_B == 1, Include_B := "new"][Count_B > 1, Include_B := "continued"][])
# ID Date Status Occurrence_B Count_B Include_B
# 1: A 7/28/15 12:00 AM A 0 0 NA
# 2: A 7/28/15 12:30 AM A 0 0 NA
# 3: A 7/30/15 12:00 AM B 1 1 new
# 4: A 7/31/15 12:00 AM B 1 2 continued
# 5: A 7/31/15 11:00 AM B 1 3 continued
# 6: A 8/2/15 10:00 AM B 1 4 continued
# 7: A 8/3/15 12:00 AM C 0 0 NA
# 8: A 8/4/15 12:00 AM B 1 1 new
# 9: A 8/5/15 12:00 AM B 1 2 continued
#10: A 8/6/15 12:00 AM A 0 0 NA
#11: A 8/7/15 12:00 AM B 1 1 new