在事件之前计算行数 - data.table

时间:2016-05-13 16:24:31

标签: r data.table

我有一个数据集,包括用户和顺序事件以及介于两者之间的非事件。

DT = data.table(user = c("1001","1001","1001","1001","1001","1001",
                          "1002","1002","1002","1002"), 
               event = c(NA,"e1",NA,NA,NA,"e2",
                           "e1",NA,NA,"e2"))

我希望能够在用户组发生事件之前计算行(非事件)。预期结果:

   user  event  rows.before.event
 1: 1001    NA                 NA
 2: 1001    e1                  1
 3: 1001    NA                 NA
 4: 1001    NA                 NA
 5: 1001    NA                 NA
 6: 1001    e2                  3
 7: 1002    e1                  0
 8: 1002    NA                 NA
 9: 1002    NA                 NA
10: 1002    e2                  2

尝试rleid()但没有成功。欢迎任何建议。

4 个答案:

答案 0 :(得分:8)

DT[, count := .N-1, by = .(user, rev(cumsum(rev(!is.na(event)))))][
   is.na(event), count := NA]
#    user event count
# 1: 1001    NA    NA
# 2: 1001    e1     1
# 3: 1001    NA    NA
# 4: 1001    NA    NA
# 5: 1001    NA    NA
# 6: 1001    e2     3
# 7: 1002    e1     0
# 8: 1002    NA    NA
# 9: 1002    NA    NA
#10: 1002    e2     2

答案 1 :(得分:6)

包含rleidshift的解决方案:

DT[, before := .N, by = .(user, rleid(is.na(event)))
   ][, before := shift(before, fill = 0), by = user
     ][is.na(event), before := NA][]

给出:

    user event before
 1: 1001    NA     NA
 2: 1001    e1      1
 3: 1001    NA     NA
 4: 1001    NA     NA
 5: 1001    NA     NA
 6: 1001    e2      3
 7: 1002    e1      0
 8: 1002    NA     NA
 9: 1002    NA     NA
10: 1002    e2      2

答案 2 :(得分:4)

> DT[, rows.before.event:=  ifelse(is.na(event), NA, .N - 1) ,by = list(user, c(0, cumsum(!is.na(event))[-length(event)]))]
> DT
    user event rows.before.event
 1: 1001    NA                NA
 2: 1001    e1                 1
 3: 1001    NA                NA
 4: 1001    NA                NA
 5: 1001    NA                NA
 6: 1001    e2                 3
 7: 1002    e1                 0
 8: 1002    NA                NA
 9: 1002    NA                NA
10: 1002    e2                 2

答案 3 :(得分:3)

如果您想要另一种实现相同目标的方法:

library(zoo)
DT$group <- rev(na.locf(rev(DT$event))) 
DT[, rowsbefore := lapply(.SD,function(x) {sum(is.na(x))}) , by = .(user,group)]
DT$rowsbefore <- ifelse(is.na(DT$event),NA,DT$rowsbefore)

> DT
    user event group rowsbefore
 1: 1001    NA    e1         NA
 2: 1001    e1    e1          1
 3: 1001    NA    e2         NA
 4: 1001    NA    e2         NA
 5: 1001    NA    e2         NA
 6: 1001    e2    e2          3
 7: 1002    e1    e1          0
 8: 1002    NA    e2         NA
 9: 1002    NA    e2         NA
10: 1002    e2    e2          2

如果您不想替换NA并想要群组总和,则可以省略最后一行。

编辑 - Per @ Procrastinatus Maximus&#39;下面的评论,一个更好的方法来做同样的解决方案:

DT[, rowsbefore := sum(is.na(event)), by = .(user, rev(na.locf(rev(event))))
   ][is.na(event), rowsbefore := NA]