如何在dplyr summaryrise_中使用标准评估

时间:2017-08-29 15:18:43

标签: r dplyr

我看过几个地方,但我无法弄清楚如何做到这一点。看起来它已经改变了几次,所以更加令人困惑

我想将Endoscopist的NumOfBx作为函数的一部分进行总结。我有以下数据框

vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", 
"John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", 
"Mar Gret ", "Phil Ip ", "Phil Ip "), NumbOfBx = c(2, 4, NA, 
2, 12, 12, NA, NA, NA, 3, NA)), row.names = 100:110, .Names = c("Endoscopist", 
"NumbOfBx"), class = "data.frame")

我的功能是:

NumBx <- function(x, y, z) {
  x <- data.frame(x)
  x <- x[!is.na(x[,y]), ]
  NumBxPlot <- x %>% group_by_(z) %>% summarise(avg = mean(y, na.rm = T))
}

我打电话给:

NumBx(vv,"Endoscopist","NumOfBx)

这给了我错误:

Warning messages:
1: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA

我将功能更改为使用summarise_

但我得到同样的东西。然后我意识到需要summarise_专门(而不是group_by_)需要标准评估,我尝试了这个(来自this stackoverflow example

library(lazyeval)
NumBx <- function(x, y, z) {
  x <- data.frame(x)
  x <- x[!is.na(x[,y]), ]
  NumBxPlot <- x %>% group_by_(z) %>% 
      summarise_(sum_val = interp(~mean(y, na.rm = TRUE), var = as.name(y)))

但我仍然得到同样的错误:

Warning messages:
1: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA

我的预期输出是:

Endoscopist   Avg
Jupi Ter       4
John Boy       28
Phil Ip        3

2 个答案:

答案 0 :(得分:2)

使用rlang(lazyeval的替代品),您可以

library(dplyr)

vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", "John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", "Mar Gret ", "Phil Ip ", "Phil Ip "), 
                     NumbOfBx = c(2, 4, NA, 2, 12, 12, NA, NA, NA, 3, NA)), 
                row.names = 100:110, .Names = c("Endoscopist", "NumbOfBx"), class = "data.frame")

num_bx <- function(.data, group, variable) {
    group <- enquo(group)
    variable <- enquo(variable)

    .data %>% 
        tidyr::drop_na(!!variable) %>% 
        group_by(!!group) %>% 
        summarise(avg = mean(!!variable))
}

vv %>% num_bx(Endoscopist, NumbOfBx)
#> # A tibble: 3 x 2
#>   Endoscopist   avg
#>         <chr> <dbl>
#> 1   John Boy      7
#> 2   Jupi Ter      4
#> 3    Phil Ip      3

或者如果您想将其保留为字符串而不是不带引号的名称,

num_bx <- function(.data, group, variable) {
    group <- rlang::sym(group)
    variable <- rlang::sym(variable)

    .data %>% 
        tidyr::drop_na(!!variable) %>% 
        group_by(!!group) %>% 
        summarise(avg = mean(!!variable))
}

vv %>% num_bx("Endoscopist", "NumbOfBx")
#> # A tibble: 3 x 2
#>   Endoscopist   avg
#>         <chr> <dbl>
#> 1   John Boy      7
#> 2   Jupi Ter      4
#> 3    Phil Ip      3

答案 1 :(得分:1)

dplyr programming vignette之后,按如下方式定义您的函数:

NumBx <- function( x, y, z )
{
    yy <- enquo( y )
    zz <- enquo( z )

    data.frame(x) %>% filter( !is.na(!!yy) ) %>% group_by( !!zz ) %>%
        summarize( avg = mean(!!yy) )
}

您现在可以将其命名为:

NumBx( vv, NumbOfBx, Endoscopist )
#   Endoscopist   avg
#         <chr> <dbl>
# 1   John Boy      7
# 2   Jupi Ter      4
# 3    Phil Ip      3

一些注意事项:

  1. 您通话中的参数顺序似乎相反。您希望按z进行分组,但是您将NumbOfBx作为z参数传递。
  2. na.rm=TRUE是多余的。您已经过滤了y变量为NA的行。
  3. John Boy的平均值应为7,而不是28(预期输出中的值)。