我有这样的表(输入):
user_id event timestamp
Rob business 111111
Rob business 222222
Mike progress 111111
Mike progress 222222
Rob progress 000001
Mike business 333333
Mike progress 444444
Lee progress 111111
Lee progress 222222
dput
表:
dput(input)
structure(list(user_id = structure(c(3L, 3L, 2L, 2L, 3L, 2L,
2L, 1L, 1L), .Label = c("Lee", "Mike", "Rob"), class = "factor"),
event = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("business",
"progress"), class = "factor"), timestamp = c(111111, 222222,
111111, 222222, 1, 333333, 444444, 111111, 222222)), .Names = c("user_id",
"event", "timestamp"), row.names = c(NA, -9L), class = "data.frame")
我想在第一个progress
事件发生(输出)之前知道上次business
事件:
user_id event timestamp
Mike progress 222222
Rob progress 000001
感谢您的帮助!!!
答案 0 :(得分:2)
我们可以尝试data.table
library(data.table)
setDT(df1)[df1[order(as.numeric(timestamp)), if(any(event == "business"))
.I[tail(which(cumsum(event == "business")==0),1)], user_id]$V1]
# user_id event timestamp
#1: Rob progress 000001
#2: Mike progress 222222
答案 1 :(得分:1)
不确定我是否完全了解你要做的事情。使用which
,您可以获取所有非商业事件的索引(您的数据称为input
):
indexes <- which(input$event != "business")
然后,您可以过滤此索引向量,以便在最后一个业务事件之前只有非业务事件:
indexes <- indexes[indexes < max(which(input$event == "business"))]
查看我们剩下的行:
> input[indexes,]
user_id event timestamp
3 Mike progress 111111
4 Mike progress 222222
5 Rob progress 1