如果每个观察可以属于具有多个分组变量的多个组,则进行聚合

时间:2018-05-23 11:27:52

标签: r dplyr data.table tidyverse

这个问题是对Aggregating if each observation can belong to multiple groups的跟进。

在链接问题中,我的观察结果属于几个群体。但是现在我得到了2个分组变量,这使问题变得更加困难(至少对我而言)。 在下面的例子中,观察可以属于A,B,C组中的一个或多个。但是我还想根据另一个因素进行区分,即x <1。 1,x <.5或y <1。由于所有x小0也小1,每个观察可以再次属于多个组。我想根据两个分组(A,B,C和x <1,x <.5,y <0)进行聚合,并得到所有组合的聚合((A和x <1), (A和x <.5),......,(C和x <0)。 如果问题不够明确并且可以自由编辑标题,请告诉我,因为我无法找到合适的标题。

# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)


df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
                 groupB = sample(TF, size = n, replace = TRUE),
                 groupC = sample(TF, size = n, replace = TRUE))

df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]

# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)

0 个答案:

没有答案