我正在努力编写一个在dplyr::mutate()
内部运行的函数。
由于rowwise() %>% sum()
在大型数据集上非常慢,因此建议的替代方法是返回baseR。我希望如下简化这个过程,但是在mutate函数中传递数据时遇到了麻烦。
require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)
cars %>%
mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Appears to not be getting the data passed. Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
所以问题就变成了如何通过在函数内部传递数据来解决每次包含点的需求?
rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(., na.rm = na.rm)
}
#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
由reprex package(v0.2.0)创建于2018-05-22。
以下akrun的答案(请upvote):
要解释:只需抛弃mutate()
并在新功能中执行所有操作。
这是我的最终函数,作为对他的更新,如果需要,还允许命名sum value列。
rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
bind_cols(data, .)
}
答案 0 :(得分:3)
我们可以将...
放在最后
rowwise_sum <- function(data, na.rm = FALSE,...) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
cars %>%
mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
# ... with 40 more rows
它也可以在不改变...
的位置的情况下工作(尽管一般建议使用)。这里的主要问题是在data
内的参数列表中未指定.
(mutate
)。
在函数中创建整个流程而不是执行部分
会更容易rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(sum = rowSums(., na.rm = TRUE)) %>%
bind_cols(data, .)
}
rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28