我有一个数据框,看起来像这样:
data = data.frame(userID = c("a","a","a","a","a","a","a","a","a","b","b"),
diff = c(1,1,1,81,1,1,1,2,1,1,1)
)
最终,我想得到这样的东西:
data = data.frame(userID = c("a","a","a","a","a","a","a","a","a","b","b"),
diff = c(1,1,1,81,1,1,1,2,1,1,1),
block = c(1,1,1,2,2,2,2,3,3,1,1)
)
因此,基本上,我想做的是每次diff
列中的值大于1时,都会创建一个新块。我想按组userID
进行此操作。
目前,我正在考虑使用LOCF
或编写循环,但似乎无效。有什么建议吗?谢谢!
答案 0 :(得分:1)
一种选择是按“ userID”分组,然后取逻辑表达式(diff > 1
)的累积和
library(dplyr)
data %>%
group_by(userID) %>%
mutate(block = 1 + cumsum(diff > 1))
# A tibble: 11 x 3
# Groups: userID [2]
# userID diff block
# <fct> <dbl> <dbl>
# 1 a 1 1
# 2 a 1 1
# 3 a 1 1
3 4 a 81 2
# 5 a 1 2
3 6 a 1 2
# 7 a 1 2
# 8 a 2 3
# 9 a 1 3
#10 b 1 1
#11 b 1 1
答案 1 :(得分:1)
在 base 中,您可以像这样使用ave
:
data$block <- ave(data$diff>1, data$userID, FUN=cumsum)+1
# userID diff block
#1 a 1 1
#2 a 1 1
#3 a 1 1
#4 a 81 2
#5 a 1 2
#6 a 1 2
#7 a 1 2
#8 a 2 3
#9 a 1 3
#10 b 1 1
#11 b 1 1