Question

我在R的dplyr包中创建了一个简单的数据透视表。这是我的工作示例：

library(dplyr)
mean_mpg <- mean(mtcars$mpg)

# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not

mtcars <-
mtcars %>%
  mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))

mtcars %>%
  group_by(as.factor(cyl)) %>%
  summarise(sum=sum(mpg_cat),total=n()) %>%
  mutate(percentage=sum*100/total)

现在，我想编写一个函数以重用此代码：

get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor(predictor)) %>%
    summarise(sum=sum(target),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

但是我收到以下错误：

is.factor（x）中的错误：找不到对象'cyl'

我也尝试过

get_pivot(mtcars, "cyl", "mpg_cat" )

但是没有用。

我该怎么办？

Answer 1

如果您拥有最新的rlang库更新v0.4.0（2019年6月），则可以使用双大括号{{ }}（又称“卷曲”）来简化dplyr的编程。 / p>

# Note: needs installation of rlang 0.4.0 or later
get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor( {{ predictor }} )) %>%
    summarise(sum=sum( {{ target }} ),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

# Edit -- thank you Rui Barradas
> get_pivot(mtcars, cyl, mpg_cat)
# A tibble: 3 x 4
  `as.factor(cyl)`   sum total percentage
  <fct>            <dbl> <int>      <dbl>
1 4                   11    11      100  
2 6                    3     7       42.9
3 8                    0    14        0

之所以需要这样做，是因为dplyr和其他tidyverse包使用“非标准评估”，就像遇到一些基本R函数（例如lm(mpg~factor(am),data=mtcars)）一样。这种做法通常会使“交互式”代码更短，更简单且更易于阅读，但以使编程更复杂为代价。在这种情况下，{{ }}运算符用于将您指定的列传输到函数的上下文中。

https://www.tidyverse.org/articles/2019/06/rlang-0-4-0/

如何在dplyr中定义函数？

1 个答案: