我有以下数据(df)
Id event label
1 eating 0
1 walking 0
1 sleeping finish
1 dreaming stage changed
1 snoring 0
2 drinking 0
2 running finish
2 resting 0
2 relaxing 0
这里针对每个Id(案例),label =“finish”表示案例的完成, 我正在尝试考虑直到label =“finish”的情况并删除该Id的剩余记录。可能看起来像,
Id event label
1 eating 0
1 walking 0
1 sleeping finish
2 drinking 0
2 running finish
我尝试了以下方式,但它没有帮助。任何建议,将不胜感激。感谢
df <- data.table(df)
setDT(df)[label =="finish", by=parent_id]
答案 0 :(得分:3)
使用data.table
我们可以这样做:
library(data.table)
setDT(df)[, .SD[1:which(label == "finish")], by = Id]
# Id event label
#1: 1 eating 0
#2: 1 walking 0
#3: 1 sleeping finish
#4: 2 drinking 0
#5: 2 running finish
答案 1 :(得分:2)
如果每个ID都有“完成”并且所有的ob都按上面的顺序排序,那么使用基数R的答案会更长
start <- which(!duplicated(df$ID))
end <- which(df$label =="finish")
keepObs <- unlist(lapply(unique(df$ID), function(i) start[i]:end[i]))
dfKeepers <- df[keepObs,]