Question

这个问题是对Aggregating if each observation can belong to multiple groups的跟进。

在链接问题中，我的观察结果属于几个群体。但是现在我得到了2个分组变量，这使问题变得更加困难（至少对我而言）。在下面的例子中，观察可以属于A，B，C组中的一个或多个。但是我还想根据另一个因素进行区分，即x <1。 1，x <.5或y <1。由于所有x小0也小1，每个观察可以再次属于多个组。我想根据两个分组（A，B，C和x <1，x <.5，y <0）进行聚合，并得到所有组合的聚合（（A和x <1），（A和x <.5），......，（C和x <0）。如果问题不够明确并且可以自由编辑标题，请告诉我，因为我无法找到合适的标题。

# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)


df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
                 groupB = sample(TF, size = n, replace = TRUE),
                 groupC = sample(TF, size = n, replace = TRUE))

df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]

# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)

如果每个观察可以属于具有多个分组变量的多个组，则进行聚合

0 个答案: