如何按变量分组并使用ddply进行汇总?
例如:
library(plyr)
sample <- function(x, g){
print(g)
print(x[[g]])
res = ddply(x, ~x[[g]], summarise, value = mean(value))
return(res)
}
x = data.frame(type = c('a', 'a', 'a', 'b'),
age = c(20, 21, 21, 10),
value = c(100, 120, 121, 150))
sample(x = x, g = 'age')
将失败说:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'g' not found
即使该功能打印:
[1] "age"
[1] 20 21 21 10
为什么R在打印时会找到g
,而在group_by
时却找不到?
编辑: 我希望输出为:
x[["age"]] value
1 10 150.0
2 20 100.0
3 21 120.5
答案 0 :(得分:0)
是针对由&#39; =&#39;设置的环境尝试以这种方式调用你的函数
sample(x = x, g <- 'age')
或者你可以简单地使用
# g insted of ~x[[g]]
res = ddply(x, g, summarise, value = mean(value))
答案 1 :(得分:0)
以下是使用dplyr
包的解决方案。
为了正确评估group_by函数,我需要使用将被弃用的group_by_
。
library(dplyr)
x = data.frame(type = c('a', 'a', 'a', 'b'),
age = c(20, 21, 21, 10),
value = c(100, 120, 121, 150))
sample <- function(x, g){
print(g)
print(x[[g]])
res<- group_by_(x, g) %>% summarise( mean(value))
#res = ddply(x, ~x[[g]], summarise, value = mean(value))
return(res)
}
sample(x = x, g = 'age')
答案 2 :(得分:0)
我会使用最新dplyr
版本附带的tidyeval:
sample <- function(x, g){
var <- dplyr::enquo(g)
res = x %>% group_by(!!var) %>% summarise(age_mean = mean(value))
return(res)
}
x = data.frame(type = c('a', 'a', 'a', 'b'),
age = c(20, 21, 21, 10),
value = c(100, 120, 121, 150))
sample(x, age)
# A tibble: 3 x 2
age age_mean
<dbl> <dbl>
1 10 150.0
2 20 100.0
3 21 120.5