我看过几个地方,但我无法弄清楚如何做到这一点。看起来它已经改变了几次,所以更加令人困惑
我想将Endoscopist的NumOfBx作为函数的一部分进行总结。我有以下数据框
vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ",
"John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ",
"Mar Gret ", "Phil Ip ", "Phil Ip "), NumbOfBx = c(2, 4, NA,
2, 12, 12, NA, NA, NA, 3, NA)), row.names = 100:110, .Names = c("Endoscopist",
"NumbOfBx"), class = "data.frame")
我的功能是:
NumBx <- function(x, y, z) {
x <- data.frame(x)
x <- x[!is.na(x[,y]), ]
NumBxPlot <- x %>% group_by_(z) %>% summarise(avg = mean(y, na.rm = T))
}
我打电话给:
NumBx(vv,"Endoscopist","NumOfBx)
这给了我错误:
Warning messages:
1: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
我将功能更改为使用summarise_
但我得到同样的东西。然后我意识到需要summarise_
专门(而不是group_by_
)需要标准评估,我尝试了这个(来自this stackoverflow example)
library(lazyeval)
NumBx <- function(x, y, z) {
x <- data.frame(x)
x <- x[!is.na(x[,y]), ]
NumBxPlot <- x %>% group_by_(z) %>%
summarise_(sum_val = interp(~mean(y, na.rm = TRUE), var = as.name(y)))
但我仍然得到同样的错误:
Warning messages:
1: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
我的预期输出是:
Endoscopist Avg
Jupi Ter 4
John Boy 28
Phil Ip 3
答案 0 :(得分:2)
使用rlang(lazyeval的替代品),您可以
library(dplyr)
vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", "John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", "Mar Gret ", "Phil Ip ", "Phil Ip "),
NumbOfBx = c(2, 4, NA, 2, 12, 12, NA, NA, NA, 3, NA)),
row.names = 100:110, .Names = c("Endoscopist", "NumbOfBx"), class = "data.frame")
num_bx <- function(.data, group, variable) {
group <- enquo(group)
variable <- enquo(variable)
.data %>%
tidyr::drop_na(!!variable) %>%
group_by(!!group) %>%
summarise(avg = mean(!!variable))
}
vv %>% num_bx(Endoscopist, NumbOfBx)
#> # A tibble: 3 x 2
#> Endoscopist avg
#> <chr> <dbl>
#> 1 John Boy 7
#> 2 Jupi Ter 4
#> 3 Phil Ip 3
或者如果您想将其保留为字符串而不是不带引号的名称,
num_bx <- function(.data, group, variable) {
group <- rlang::sym(group)
variable <- rlang::sym(variable)
.data %>%
tidyr::drop_na(!!variable) %>%
group_by(!!group) %>%
summarise(avg = mean(!!variable))
}
vv %>% num_bx("Endoscopist", "NumbOfBx")
#> # A tibble: 3 x 2
#> Endoscopist avg
#> <chr> <dbl>
#> 1 John Boy 7
#> 2 Jupi Ter 4
#> 3 Phil Ip 3
答案 1 :(得分:1)
在dplyr programming vignette之后,按如下方式定义您的函数:
NumBx <- function( x, y, z )
{
yy <- enquo( y )
zz <- enquo( z )
data.frame(x) %>% filter( !is.na(!!yy) ) %>% group_by( !!zz ) %>%
summarize( avg = mean(!!yy) )
}
您现在可以将其命名为:
NumBx( vv, NumbOfBx, Endoscopist )
# Endoscopist avg
# <chr> <dbl>
# 1 John Boy 7
# 2 Jupi Ter 4
# 3 Phil Ip 3
一些注意事项:
z
进行分组,但是您将NumbOfBx
作为z
参数传递。na.rm=TRUE
是多余的。您已经过滤了y
变量为NA的行。John Boy
的平均值应为7
,而不是28
(预期输出中的值)。