R:根据非重复运行总计创建组

时间:2019-06-26 08:57:01

标签: r

扩展到我之前的问题

Creating groups based on running totals against a value

先前的问题:“我在一个变量Y处具有唯一的数据。另一个变量Z告诉我每个Y中有多少人。我的问题是我想从这些Y和Z中创建45人一组。我表示每当Z的运行总数达到45时,就会进行分组,然后代码继续创建下一组。”

问题的扩展:如果现在仅X的变量A也在变化。例如,它可以B一段时间,然后可以变成C。如何防止代码生成不在X的两个类别中的组。例如,如果Group = 3,那么如何确保3不在类别AB中?

以前,我使用@tmfmnk的两个答案

df %>% 
 mutate(Cumsum = accumulate(Z, ~ if_else(.x >= 45, .y, .x + .y)),
        Group = cumsum(Cumsum >= 45),
        Group = if_else(Group > lag(Group, default = first(Group)), lag(Group), Group) + 1)

和@G。格洛腾迪克

Accum <- function(acc, x) if (acc < 45)  acc + x else x
r <- Reduce(Accum, DF$Z, accumulate = TRUE)
g <- rev(cumsum(rev(r) >= 45))
g <- max(g) - g + 1

transform(DF, Cumsum = r, Group = g)

两个代码都可以解决第一个问题。

我的数据看起来像这样


I have data which is unique at one variable Y. Another variable Z tells me how many people are in each of Y. My problem is that I want to create groups of 45 from these Y and Z. I mean that whenever the running total of Z touches 45, one group is made and the code moves on to create the next group.

My data looks something like this

ID  X   Y   Z
1   A   A   1
2   A   B   5
3   A   C   2
4   A   D   42
5   A   E   10
6   A   F   2
7   A   G   0
8   A   H   3
9   A   I   0
10  A   J   8
11  A   K   19
12  B   L   4
13  B   M   1
14  B   N   1
15  B   O   2
16  B   P   0
17  B   Q   1
18  B   R   2

我想要这样的东西

ID  X   Y   Z   CumSum  Group
1   A   A   1   1   1
2   A   B   5   6   1
3   A   C   2   8   1
4   A   D   42  50  1
5   A   E   10  10  2
6   A   F   2   12  2
7   A   G   0   12  2
8   A   H   3   15  2
9   A   I   0   15  2
10  A   J   8   23  2
11  A   K   19  42  2
12  B   L   3   3   3
13  B   M   1   4   3
14  B   N   1   5   3
15  B   O   2   7   3   
16  B   P   0   7   3
17  B   Q   1   8   3
18  B   R   2   9   3

请告诉我可以做什么。

1 个答案:

答案 0 :(得分:1)

也许不是最性感的解决方案,但我想它可以满足您的要求。

使用拆分应用合并方法和R中的新group_split函数。定义一个maxval来跟踪组数并始终在下一个数据帧中累加

 df = data.frame(
        ID = c(1:18),
        X = c(rep("A", 11), rep("B", 7)),
        Y = LETTERS[1:18],
        Z = c(1,5,2,42,10,2,0,3,0,8,19,4,1,1,2,0,1,2)
    )

library(dplyr)


listofdfs <- df %>% 
    group_split(X)
listofdfs

maxval = 0

for(i in 1:length(listofdfs)) {
    listofdfs[[i]] <- listofdfs[[i]] %>%
        mutate(Cumsum = accumulate(Z, ~ if_else(.x >= 45, .y, .x + .y)),
               Group = cumsum(Cumsum >= 45),
               Group = if_else(Group > lag(Group, default = first(Group)), lag(Group), Group) + 1 + maxval)
    maxval <- max(listofdfs[[i]]$Group)
}

listofdfs

result <- rbindlist(listofdfs)
result


    ID X Y  Z Cumsum Group
 1:  1 A A  1      1     1
 2:  2 A B  5      6     1
 3:  3 A C  2      8     1
 4:  4 A D 42     50     1
 5:  5 A E 10     10     2
 6:  6 A F  2     12     2
 7:  7 A G  0     12     2
 8:  8 A H  3     15     2
 9:  9 A I  0     15     2
10: 10 A J  8     23     2
11: 11 A K 19     42     2
12: 12 B L  4      4     3
13: 13 B M  1      5     3
14: 14 B N  1      6     3
15: 15 B O  2      8     3
16: 16 B P  0      8     3
17: 17 B Q  1      9     3
18: 18 B R  2     11     3