编写一个在dplyr :: mutate()内部工作的自定义函数

时间:2018-05-22 16:35:11

标签: r dplyr

我正在努力编写一个在dplyr::mutate()内部运行的函数。

由于rowwise() %>% sum()在大型数据集上非常慢,因此建议的替代方法是返回baseR。我希望如下简化这个过程,但是在mutate函数中传递数据时遇到了麻烦。

require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)

cars %>% 
  mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
  columns <- rlang::enquos(...)

  data %>% 
    select(!!!columns) %>% 
    rowSums(na.rm = na.rm)
}

#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".

#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#>  [1]   6  14  11  29  24  19  28  36  44  28  39  26  32  36  40  39  47
#> [18]  47  59  40  50  74  94  35  41  69  48  56  49  57  67  60  74  94
#> [35] 102  55  65  87  52  68  72  76  84  88  77  94 116 117 144 110

#Appears to not be getting the data passed.  Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

所以问题就变成了如何通过在函数内部传递数据来解决每次包含点的需求?

rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
  columns <- rlang::enquos(...)

  data %>% 
    select(!!!columns) %>% 
    rowSums(., na.rm = na.rm)
}

#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".

#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#>  [1]   6  14  11  29  24  19  28  36  44  28  39  26  32  36  40  39  47
#> [18]  47  59  40  50  74  94  35  41  69  48  56  49  57  67  60  74  94
#> [35] 102  55  65  87  52  68  72  76  84  88  77  94 116 117 144 110

#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

reprex package(v0.2.0)创建于2018-05-22。

以下akrun的答案(请upvote):

要解释:只需抛弃mutate()并在新功能中执行所有操作。

这是我的最终函数,作为对他的更新,如果需要,还允许命名sum value列。

rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {

  columns <- rlang::enquos(...)

  data %>%
    select(!!! columns) %>%
    transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
    bind_cols(data, .)
}

1 个答案:

答案 0 :(得分:3)

我们可以将...放在最后

rowwise_sum <- function(data, na.rm = FALSE,...) {
  columns <- rlang::enquos(...)
  data %>%
     select(!!!columns) %>%
     rowSums(na.rm = na.rm)
}

cars %>% 
     mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
#   speed  dist   sum
#   <dbl> <dbl> <dbl>
# 1     4     2     6
# 2     4    10    14
# 3     7     4    11
# 4     7    22    29
# 5     8    16    24
# 6     9    10    19
# 7    10    18    28
# 8    10    26    36
# 9    10    34    44
#10    11    17    28
# ... with 40 more rows

它也可以在不改变...的位置的情况下工作(尽管一般建议使用)。这里的主要问题是在data内的参数列表中未指定.mutate)。

在函数中创建整个流程而不是执行部分

会更容易
rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
  columns <- rlang::enquos(...)
  data %>%
      select(!!! columns) %>%
      transmute(sum = rowSums(., na.rm = TRUE)) %>%
      bind_cols(data, .)

}

rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
#   speed  dist   sum
#   <dbl> <dbl> <dbl>
# 1     4     2     6
# 2     4    10    14
# 3     7     4    11
# 4     7    22    29
# 5     8    16    24
# 6     9    10    19
# 7    10    18    28
# 8    10    26    36
# 9    10    34    44
#10    11    17    28