Question

考虑下表df，其中分类变量标注为x1和x2，数字衡量标注为y1，y2和y3：

df <- data.frame(x1=sample(letters[1:3], 20, replace=TRUE),
           x2=sample(letters[4:6], 20, replace=TRUE),
           y1=rnorm(20), y2=rnorm(20), y3=rnorm(20))

我想在其上应用关于分类变量y的3个数值测量x的函数。例如，以下函数，其中输入y是一个包含3列的表，应该输出一个新列：

f <- function(y){   sum((y[,1] - y[,2]) / y[,3]) }

我尝试使用aggregate，dplyr，summarizeBy ..但没有成功，因为似乎对于每种方法，混合输入列都不是一种选择。有关如何使用这种功能（即利用聚合）的任何想法？

aggregate(data = df, y1 + y2 + y3 ~ x1 + x2, FUN = f)

为了澄清，可以通过以下方式获得预期结果：

groups <- unique(df[,c("x1", "x2")]) # coocurences of explanatory variables
res <- c()
for (i in 1:nrow(groups)){ # get the subtables
  temp <- df[df$x1 == groups[i,1] & df$x2 == groups[i,2], c("y1", "y2", "y3")]
  res <- c(res, f(temp)) # apply function on subtables
}
groups$res <- res # aggregate results

对于这个简单的玩具示例来说，这并不是那么胖，但对于更复杂的数据来说这是非常不切实际的。

Answer 1

问题出在功能的输入端。你指定它的方式，它需要一个数据帧。

可能的洗脱方法是向函数提供列表列表。只需对您的功能稍作修改：

f <- function(y) sum((y[[1]] - y[[2]]) / y[[3]])

您现在可以在dplyr - 链中使用它：

df %>% 
  group_by(x1, x2) %>% 
  summarise(sum_y = f(list(y1, y2, y3)))

给出：

# A tibble: 9 x 3
# Groups:   x1 [?]
  x1    x2     sum_y
  <fct> <fct>  <dbl>
1 a     d      1.20 
2 a     e      0.457
3 a     f     -9.46 
4 b     d     -1.11 
5 b     e     -0.176
6 b     f     -1.34 
7 c     d     -0.994
8 c     e      3.38 
9 c     f     -2.63

通过应用多列函数来聚合表

1 个答案: