R根据连续出现的值

时间:2018-05-16 05:11:04

标签: r dataframe






您好 有没有人知道如何连续地为具有相同值或因子的数据组制作子因子或唯一标记

所以我的数据看起来像这样

value   group| subgrouping  
1       a     a.1
5       a     a.1
2       a     a.1
3       b     b.1
2       b     b.1
5       b     b.1
2       b     b.1
1       b     b.1
3       b     b.1
2       a     a.2
5       a     a.2
5       a     a.2
6       a     a.2
6       a     a.2
2       a     a.2
1       a     a.2
0       c     c.1
3       c     c.1
3       c     c.1
2       b     b.2
1       b     b.2
3       a     a.3
2       b     b.3
3       b     b.3

通过这种方式,我可以找到a.2的平均值而不是全部

3 个答案:

答案 0 :(得分:0)

我发现这个技巧在这种情况下运作良好。如上所述,它不会单独跟踪每个组,但它可能就足够了:

df %>% 
  mutate(subgroup_id = cumsum(lag(group, default = group[1]) != group))

答案 1 :(得分:0)

尝试rle

x <- rle(df$group)
x$values <- with(x, ave(values, values, FUN = function(x) paste0(x, '.', seq_along(x))))
df$subgrouping2 <- inverse.rle(x)
df

# '> df
#     value group subgrouping subgrouping2
# 1:     1     a         a.1          a.1
# 2:     5     a         a.1          a.1
# 3:     2     a         a.1          a.1
# 4:     3     b         b.1          b.1
# 5:     2     b         b.1          b.1
# 6:     5     b         b.1          b.1
# 7:     2     b         b.1          b.1
# 8:     1     b         b.1          b.1
# 9:     3     b         b.1          b.1
# 10:     2     a         a.2          a.2
# 11:     5     a         a.2          a.2
# 12:     5     a         a.2          a.2
# 13:     6     a         a.2          a.2
# 14:     6     a         a.2          a.2
# 15:     2     a         a.2          a.2
# 16:     1     a         a.2          a.2
# 17:     0     c         c.1          c.1
# 18:     3     c         c.1          c.1
# 19:     3     c         c.1          c.1
# 20:     2     b         b.2          b.2
# 21:     1     b         b.2          b.2
# 22:     3     a         a.3          a.3
# 23:     2     b         b.3          b.3
# 24:     3     b         b.3          b.3

答案 2 :(得分:0)

使用data.table按照&#39;组(rleid(group))的游程ID进行分组,获取first&#39;组&#39;价值和观察次数(.N),然后按&#39;分组&#39;,paste按照&#39;组&#39;进行观察,按照数字复制在order之后的观察结果&#39; ind&#39;并指定那些以创建&#39; subgroup2&#39;

library(data.table)
sgrp <- setDT(df1)[, .(group = first(group), n = .N), 
  .(ind = rleid(group))][, .(paste(group, seq_len(.N), sep="."), n, ind), 
       group][order(ind), rep(V1, n)]
df1[, subgroup2 := sgrp]
df1
#    value group subgrouping subgroup2
# 1:     1     a         a.1       a.1
# 2:     5     a         a.1       a.1
# 3:     2     a         a.1       a.1
# 4:     3     b         b.1       b.1
# 5:     2     b         b.1       b.1
# 6:     5     b         b.1       b.1
# 7:     2     b         b.1       b.1
# 8:     1     b         b.1       b.1
# 9:     3     b         b.1       b.1
#10:     2     a         a.2       a.2
#11:     5     a         a.2       a.2
#12:     5     a         a.2       a.2
#13:     6     a         a.2       a.2
#14:     6     a         a.2       a.2
#15:     2     a         a.2       a.2
#16:     1     a         a.2       a.2
#17:     0     c         c.1       c.1
#18:     3     c         c.1       c.1
#19:     3     c         c.1       c.1
#20:     2     b         b.2       b.2
#21:     1     b         b.2       b.2
#22:     3     a         a.3       a.3
#23:     2     b         b.3       b.3
#24:     3     b         b.3       b.3