您好 有没有人知道如何连续地为具有相同值或因子的数据组制作子因子或唯一标记
所以我的数据看起来像这样
value group| subgrouping
1 a a.1
5 a a.1
2 a a.1
3 b b.1
2 b b.1
5 b b.1
2 b b.1
1 b b.1
3 b b.1
2 a a.2
5 a a.2
5 a a.2
6 a a.2
6 a a.2
2 a a.2
1 a a.2
0 c c.1
3 c c.1
3 c c.1
2 b b.2
1 b b.2
3 a a.3
2 b b.3
3 b b.3
通过这种方式,我可以找到a.2的平均值而不是全部
答案 0 :(得分:0)
我发现这个技巧在这种情况下运作良好。如上所述,它不会单独跟踪每个组,但它可能就足够了:
df %>%
mutate(subgroup_id = cumsum(lag(group, default = group[1]) != group))
答案 1 :(得分:0)
尝试rle
:
x <- rle(df$group)
x$values <- with(x, ave(values, values, FUN = function(x) paste0(x, '.', seq_along(x))))
df$subgrouping2 <- inverse.rle(x)
df
# '> df
# value group subgrouping subgrouping2
# 1: 1 a a.1 a.1
# 2: 5 a a.1 a.1
# 3: 2 a a.1 a.1
# 4: 3 b b.1 b.1
# 5: 2 b b.1 b.1
# 6: 5 b b.1 b.1
# 7: 2 b b.1 b.1
# 8: 1 b b.1 b.1
# 9: 3 b b.1 b.1
# 10: 2 a a.2 a.2
# 11: 5 a a.2 a.2
# 12: 5 a a.2 a.2
# 13: 6 a a.2 a.2
# 14: 6 a a.2 a.2
# 15: 2 a a.2 a.2
# 16: 1 a a.2 a.2
# 17: 0 c c.1 c.1
# 18: 3 c c.1 c.1
# 19: 3 c c.1 c.1
# 20: 2 b b.2 b.2
# 21: 1 b b.2 b.2
# 22: 3 a a.3 a.3
# 23: 2 b b.3 b.3
# 24: 3 b b.3 b.3
答案 2 :(得分:0)
使用data.table
按照&#39;组(rleid(group)
)的游程ID进行分组,获取first
&#39;组&#39;价值和观察次数(.N
),然后按&#39;分组&#39;,paste
按照&#39;组&#39;进行观察,按照数字复制在order
之后的观察结果&#39; ind&#39;并指定那些以创建&#39; subgroup2&#39;
library(data.table)
sgrp <- setDT(df1)[, .(group = first(group), n = .N),
.(ind = rleid(group))][, .(paste(group, seq_len(.N), sep="."), n, ind),
group][order(ind), rep(V1, n)]
df1[, subgroup2 := sgrp]
df1
# value group subgrouping subgroup2
# 1: 1 a a.1 a.1
# 2: 5 a a.1 a.1
# 3: 2 a a.1 a.1
# 4: 3 b b.1 b.1
# 5: 2 b b.1 b.1
# 6: 5 b b.1 b.1
# 7: 2 b b.1 b.1
# 8: 1 b b.1 b.1
# 9: 3 b b.1 b.1
#10: 2 a a.2 a.2
#11: 5 a a.2 a.2
#12: 5 a a.2 a.2
#13: 6 a a.2 a.2
#14: 6 a a.2 a.2
#15: 2 a a.2 a.2
#16: 1 a a.2 a.2
#17: 0 c c.1 c.1
#18: 3 c c.1 c.1
#19: 3 c c.1 c.1
#20: 2 b b.2 b.2
#21: 1 b b.2 b.2
#22: 3 a a.3 a.3
#23: 2 b b.3 b.3
#24: 3 b b.3 b.3