我有一个data.table
为不同的客户(“客户”)提供了许多活动,并希望拆分同一客户的每个差距(“缺失事件”)的事件。< / p>
电子。 G。假设我有月度事件数据,一个或多个月的缺失事件是“差距”,而连续几个月的事件属于同一组:
library(data.table)
library(lubridate) # for ymd()
dt <- data.table(client.no = c(rep("Client_A", 3), rep("Client_B", 5), rep("Client_C", 2)),
event.date = ymd(20160101, 20160201, 20160301, 20151201, 20160101, 20160301, 20160501, 20160601, 20140701, 20150101))
使用dt
client.no event.date
1: Client_A 2016-01-01
2: Client_A 2016-02-01
3: Client_A 2016-03-01
4: Client_B 2015-12-01
5: Client_B 2016-01-01
6: Client_B 2016-03-01
7: Client_B 2016-05-01
8: Client_B 2016-06-01
9: Client_C 2014-07-01
10: Client_C 2015-01-01
结果应为同一组中每行的组编号,例如: G:
client.no event.date group.no
1: Client_A 2016-01-01 1
2: Client_A 2016-02-01 1
3: Client_A 2016-03-01 1
4: Client_B 2015-12-01 1
5: Client_B 2016-01-01 1
6: Client_B 2016-03-01 2
7: Client_B 2016-05-01 3
8: Client_B 2016-06-01 3
9: Client_C 2014-07-01 1
10: Client_C 2015-01-01 2
不需要为每个客户端将组号重置为一个(但会很好)。
您可以假设事件是在每个客户端内排序的,并且同一客户端中没有重复的事件日期。
答案 0 :(得分:3)
您可以使用cumsum
:
dt[,z:=cumsum(c(1,diff(event.date)>31)),by=client.no]
输出:
client.no event.date z
1: Client_A 2016-01-01 1
2: Client_A 2016-02-01 1
3: Client_A 2016-03-01 1
4: Client_B 2015-12-01 1
5: Client_B 2016-01-01 1
6: Client_B 2016-03-01 2
7: Client_B 2016-05-01 3
8: Client_B 2016-06-01 3
9: Client_C 2014-07-01 1
10: Client_C 2015-01-01 2