Question

如何按变量分组并使用ddply进行汇总？

例如：

library(plyr)

sample <- function(x, g){
  print(g)
  print(x[[g]])
  res = ddply(x, ~x[[g]], summarise, value = mean(value))
  return(res)
}

x = data.frame(type = c('a', 'a', 'a', 'b'), 
               age = c(20, 21, 21, 10), 
               value = c(100, 120, 121, 150))
sample(x = x, g = 'age')

将失败说：

 Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x,  : 
  object 'g' not found

即使该功能打印：

[1] "age"
[1] 20 21 21 10

为什么R在打印时会找到g，而在group_by时却找不到？

编辑：我希望输出为：

  x[["age"]] value
1         10 150.0
2         20 100.0
3         21 120.5

Answer 1

是针对由＆＃39; =＆＃39;设置的环境尝试以这种方式调用你的函数

sample(x = x, g <- 'age')

或者你可以简单地使用

# g insted of ~x[[g]]
res = ddply(x, g, summarise, value = mean(value))

Answer 2

以下是使用dplyr包的解决方案。为了正确评估group_by函数，我需要使用将被弃用的group_by_。

library(dplyr)

x = data.frame(type = c('a', 'a', 'a', 'b'), 
               age = c(20, 21, 21, 10), 
               value = c(100, 120, 121, 150))

sample <- function(x, g){
  print(g)
  print(x[[g]])
  res<- group_by_(x, g) %>% summarise( mean(value))
  #res = ddply(x, ~x[[g]], summarise, value = mean(value))
  return(res)
}

sample(x = x, g = 'age')

Answer 3

我会使用最新dplyr版本附带的tidyeval：

sample <- function(x, g){
var <- dplyr::enquo(g)
res = x %>% group_by(!!var) %>% summarise(age_mean = mean(value))
return(res)
}

x = data.frame(type = c('a', 'a', 'a', 'b'), 
           age = c(20, 21, 21, 10), 
           value = c(100, 120, 121, 150))
sample(x, age)

# A tibble: 3 x 2
     age age_mean
  <dbl>    <dbl>
1    10    150.0
2    20    100.0
3    21    120.5

按函数内部的变量进行分组和汇总

3 个答案: