如何基于组的列值创建“块”?

时间:2019-07-23 15:46:22

标签: r dataframe data-manipulation locf

我有一个数据框,看起来像这样:

data = data.frame(userID = c("a","a","a","a","a","a","a","a","a","b","b"), 
                 diff = c(1,1,1,81,1,1,1,2,1,1,1)
)

最终,我想得到这样的东西:

data = data.frame(userID = c("a","a","a","a","a","a","a","a","a","b","b"), 
                  diff = c(1,1,1,81,1,1,1,2,1,1,1),
                  block = c(1,1,1,2,2,2,2,3,3,1,1)
)

因此,基本上,我想做的是每次diff列中的值大于1时,都会创建一个新块。我想按组userID进行此操作。

目前,我正在考虑使用LOCF或编写循环,但似乎无效。有什么建议吗?谢谢!

2 个答案:

答案 0 :(得分:1)

一种选择是按“ userID”分组,然后取逻辑表达式(diff > 1)的累积和

library(dplyr)
data %>% 
   group_by(userID) %>% 
   mutate(block = 1 + cumsum(diff > 1))
# A tibble: 11 x 3
# Groups:   userID [2]
#   userID  diff block
#   <fct>  <dbl> <dbl>
# 1 a          1     1
# 2 a          1     1
# 3 a          1     1
3 4 a         81     2
# 5 a          1     2
3 6 a          1     2
# 7 a          1     2
# 8 a          2     3
# 9 a          1     3
#10 b          1     1
#11 b          1     1

答案 1 :(得分:1)

base 中,您可以像这样使用ave

data$block <- ave(data$diff>1, data$userID, FUN=cumsum)+1
#   userID diff block
#1       a    1     1
#2       a    1     1
#3       a    1     1
#4       a   81     2
#5       a    1     2
#6       a    1     2
#7       a    1     2
#8       a    2     3
#9       a    1     3
#10      b    1     1
#11      b    1     1