我有一个数据集,包括用户和顺序事件以及介于两者之间的非事件。
DT = data.table(user = c("1001","1001","1001","1001","1001","1001",
"1002","1002","1002","1002"),
event = c(NA,"e1",NA,NA,NA,"e2",
"e1",NA,NA,"e2"))
我希望能够在用户组发生事件之前计算行(非事件)。预期结果:
user event rows.before.event
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
尝试rleid()
但没有成功。欢迎任何建议。
答案 0 :(得分:8)
DT[, count := .N-1, by = .(user, rev(cumsum(rev(!is.na(event)))))][
is.na(event), count := NA]
# user event count
# 1: 1001 NA NA
# 2: 1001 e1 1
# 3: 1001 NA NA
# 4: 1001 NA NA
# 5: 1001 NA NA
# 6: 1001 e2 3
# 7: 1002 e1 0
# 8: 1002 NA NA
# 9: 1002 NA NA
#10: 1002 e2 2
答案 1 :(得分:6)
包含rleid
和shift
的解决方案:
DT[, before := .N, by = .(user, rleid(is.na(event)))
][, before := shift(before, fill = 0), by = user
][is.na(event), before := NA][]
给出:
user event before
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
答案 2 :(得分:4)
> DT[, rows.before.event:= ifelse(is.na(event), NA, .N - 1) ,by = list(user, c(0, cumsum(!is.na(event))[-length(event)]))]
> DT
user event rows.before.event
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
答案 3 :(得分:3)
如果您想要另一种实现相同目标的方法:
library(zoo)
DT$group <- rev(na.locf(rev(DT$event)))
DT[, rowsbefore := lapply(.SD,function(x) {sum(is.na(x))}) , by = .(user,group)]
DT$rowsbefore <- ifelse(is.na(DT$event),NA,DT$rowsbefore)
> DT
user event group rowsbefore
1: 1001 NA e1 NA
2: 1001 e1 e1 1
3: 1001 NA e2 NA
4: 1001 NA e2 NA
5: 1001 NA e2 NA
6: 1001 e2 e2 3
7: 1002 e1 e1 0
8: 1002 NA e2 NA
9: 1002 NA e2 NA
10: 1002 e2 e2 2
如果您不想替换NA并想要群组总和,则可以省略最后一行。
编辑 - Per @ Procrastinatus Maximus&#39;下面的评论,一个更好的方法来做同样的解决方案:
DT[, rowsbefore := sum(is.na(event)), by = .(user, rev(na.locf(rev(event))))
][is.na(event), rowsbefore := NA]