如何在事件发生之前知道上次日志? R语言

时间:2016-08-01 14:45:50

标签: r

我有这样的表(输入):

user_id    event       timestamp
Rob        business    111111
Rob        business    222222
Mike       progress    111111
Mike       progress    222222
Rob        progress    000001
Mike       business    333333
Mike       progress    444444
Lee        progress    111111
Lee        progress    222222

dput表:

dput(input)
structure(list(user_id = structure(c(3L, 3L, 2L, 2L, 3L, 2L, 
2L, 1L, 1L), .Label = c("Lee", "Mike", "Rob"), class = "factor"), 
    event = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("business", 
    "progress"), class = "factor"), timestamp = c(111111, 222222, 
    111111, 222222, 1, 333333, 444444, 111111, 222222)), .Names = c("user_id", 
"event", "timestamp"), row.names = c(NA, -9L), class = "data.frame")

我想在第一个progress事件发生(输出)之前知道上次business事件:

    user_id    event       timestamp
    Mike       progress    222222
    Rob        progress    000001

感谢您的帮助!!!

2 个答案:

答案 0 :(得分:2)

我们可以尝试data.table

library(data.table)
setDT(df1)[df1[order(as.numeric(timestamp)), if(any(event == "business")) 
        .I[tail(which(cumsum(event == "business")==0),1)], user_id]$V1]   
#   user_id    event timestamp
#1:     Rob progress    000001
#2:    Mike progress    222222

答案 1 :(得分:1)

不确定我是否完全了解你要做的事情。使用which,您可以获取所有非商业事件的索引(您的数据称为input):

indexes <- which(input$event != "business")

然后,您可以过滤此索引向量,以便在最后一个业务事件之前只有非业务事件:

indexes <- indexes[indexes < max(which(input$event == "business"))]

查看我们剩下的行:

> input[indexes,]
  user_id    event timestamp
3    Mike progress    111111
4    Mike progress    222222
5     Rob progress         1