我有一个数据框如下:
user time
____ ____
1 2017-09-01 00:01:01
1 2017-09-01 00:01:20
1 2017-09-01 00:03:01
1 2017-09-01 00:10:01
1 2017-09-01 00:11:01
2 2017-09-01 00:01:03
2 2017-09-01 00:01:08
2 2017-09-01 00:03:01
从这个数据框中,我想为每个用户创建一个跟随组,如下所示:
user time follow_group
____ ____________________ _____________
1 2017-09-01 00:01:01 1
1 2017-09-01 00:01:20 1
1 2017-09-01 00:03:01 1
1 2017-09-01 00:10:01 2
1 2017-09-01 00:11:01 2
2 2017-09-01 00:01:03 1
2 2017-09-01 00:01:08 1
2 2017-09-01 00:03:01 1
当时差大于5分钟时,每个用户的关注组都会发生变化。
我尝试了延迟并减去:
data[, previous_request_time:=c(NA, time[-.N]), by=user]
但这似乎不起作用。任何帮助表示赞赏。
答案 0 :(得分:4)
只需执行difftime
操作并检查差异是否大于5分钟。然后累积金额将给你的小组计数器:
dat[,
follow_group := cumsum(difftime(time, shift(time, fill=-Inf), units="mins") > 5),
by=user
]
# user time follow_group
#1: 1 2017-09-01 00:01:01 1
#2: 1 2017-09-01 00:01:20 1
#3: 1 2017-09-01 00:03:01 1
#4: 1 2017-09-01 00:10:01 2
#5: 1 2017-09-01 00:11:01 2
#6: 2 2017-09-01 00:01:03 1
#7: 2 2017-09-01 00:01:08 1
#8: 2 2017-09-01 00:03:01 1
如果你不想过于明确单位,你可以同样使用diff
:
dat[, flwgrp := cumsum(c(Inf, diff(time)) > 5*60), by=user]