扩展到我之前的问题
Creating groups based on running totals against a value
先前的问题:“我在一个变量Y处具有唯一的数据。另一个变量Z告诉我每个Y中有多少人。我的问题是我想从这些Y和Z中创建45人一组。我表示每当Z的运行总数达到45时,就会进行分组,然后代码继续创建下一组。”
问题的扩展:如果现在仅X
的变量A
也在变化。例如,它可以B
一段时间,然后可以变成C
。如何防止代码生成不在X
的两个类别中的组。例如,如果Group = 3
,那么如何确保3不在类别A
和B
中?
以前,我使用@tmfmnk的两个答案
df %>%
mutate(Cumsum = accumulate(Z, ~ if_else(.x >= 45, .y, .x + .y)),
Group = cumsum(Cumsum >= 45),
Group = if_else(Group > lag(Group, default = first(Group)), lag(Group), Group) + 1)
和@G。格洛腾迪克
Accum <- function(acc, x) if (acc < 45) acc + x else x
r <- Reduce(Accum, DF$Z, accumulate = TRUE)
g <- rev(cumsum(rev(r) >= 45))
g <- max(g) - g + 1
transform(DF, Cumsum = r, Group = g)
两个代码都可以解决第一个问题。
我的数据看起来像这样
I have data which is unique at one variable Y. Another variable Z tells me how many people are in each of Y. My problem is that I want to create groups of 45 from these Y and Z. I mean that whenever the running total of Z touches 45, one group is made and the code moves on to create the next group.
My data looks something like this
ID X Y Z
1 A A 1
2 A B 5
3 A C 2
4 A D 42
5 A E 10
6 A F 2
7 A G 0
8 A H 3
9 A I 0
10 A J 8
11 A K 19
12 B L 4
13 B M 1
14 B N 1
15 B O 2
16 B P 0
17 B Q 1
18 B R 2
我想要这样的东西
ID X Y Z CumSum Group
1 A A 1 1 1
2 A B 5 6 1
3 A C 2 8 1
4 A D 42 50 1
5 A E 10 10 2
6 A F 2 12 2
7 A G 0 12 2
8 A H 3 15 2
9 A I 0 15 2
10 A J 8 23 2
11 A K 19 42 2
12 B L 3 3 3
13 B M 1 4 3
14 B N 1 5 3
15 B O 2 7 3
16 B P 0 7 3
17 B Q 1 8 3
18 B R 2 9 3
请告诉我可以做什么。
答案 0 :(得分:1)
也许不是最性感的解决方案,但我想它可以满足您的要求。
使用拆分应用合并方法和R中的新group_split
函数。定义一个maxval
来跟踪组数并始终在下一个数据帧中累加
df = data.frame(
ID = c(1:18),
X = c(rep("A", 11), rep("B", 7)),
Y = LETTERS[1:18],
Z = c(1,5,2,42,10,2,0,3,0,8,19,4,1,1,2,0,1,2)
)
library(dplyr)
listofdfs <- df %>%
group_split(X)
listofdfs
maxval = 0
for(i in 1:length(listofdfs)) {
listofdfs[[i]] <- listofdfs[[i]] %>%
mutate(Cumsum = accumulate(Z, ~ if_else(.x >= 45, .y, .x + .y)),
Group = cumsum(Cumsum >= 45),
Group = if_else(Group > lag(Group, default = first(Group)), lag(Group), Group) + 1 + maxval)
maxval <- max(listofdfs[[i]]$Group)
}
listofdfs
result <- rbindlist(listofdfs)
result
ID X Y Z Cumsum Group
1: 1 A A 1 1 1
2: 2 A B 5 6 1
3: 3 A C 2 8 1
4: 4 A D 42 50 1
5: 5 A E 10 10 2
6: 6 A F 2 12 2
7: 7 A G 0 12 2
8: 8 A H 3 15 2
9: 9 A I 0 15 2
10: 10 A J 8 23 2
11: 11 A K 19 42 2
12: 12 B L 4 4 3
13: 13 B M 1 5 3
14: 14 B N 1 6 3
15: 15 B O 2 8 3
16: 16 B P 0 8 3
17: 17 B Q 1 9 3
18: 18 B R 2 11 3