R dplyr方法在自己的函数里面

时间:2017-04-26 03:16:54

标签: r function dplyr

考虑对数据框进行dplyr处理:

existing.df <- filter(existing.df, justanEx > 0) %>%
                arrange(desc(justanEx)) %>%
                mutate(mean = mean(justanEx), 
                median = median(justanEx),
                rank = seq_len(length(anotherVar)))

我必须在我正在做的工作上做很多事情,所以我尝试为它做一个功能:

df.overZ <- function(data, var){
        df <- data %>% filter(var > 0) %>%
                arrange_(desc((var))) %>%
                mutate(mean = mean(var),
                median = median(var),
                rank = seq_len(length(anotherVar)))
        df
} 

和他们

existing.df <- df.overZ(existing.df, "realVar")

但这给了我这个错误:

Error in arrange_impl(.data, dots) : 
  incorrect size (1), expecting : 50000

如果我尝试:

existing.df <- df.overZ(existing.df, realVar)

我收到此错误:

Error in filter_impl(.data, dots) : obj 'realVar' not found

我已经尝试过filter_,arrange_和mutate _,

但没有任何意义上的工作。

这可以吗?

以下功能有效:

make.df <- function(var, n){
        df <- orign.df %>% filter(!is.na(var)) %>%
                select(1:2,n,3:6)
        df
}

existing.df <- make.df("oneVar",7)

1 个答案:

答案 0 :(得分:2)

使用devel版本dplyr(即将发布0.6.0),我们可以使用quosures

library(dplyr)
df.overZ <- function(data, Var){
          Var <- enquo(Var)
         data %>%
               filter(UQ(Var) > 0) %>%
               arrange(desc(UQ(Var))) %>%
               mutate(Mean = mean(UQ(Var)),
                      Median = median(UQ(Var)),
                      rank = row_number())

 }

df.overZ(iris, Sepal.Length)

我们可以将此函数扩展为group_by选项

df.overZ2 <- function(data, Var, grpVar){
          Var <- enquo(Var)
          grpVar <- enquo(grpVar)
          newVar <- paste(quo_name(Var), c("Mean", "Median", "Rank"), sep="_")
         data %>%
               filter(UQ(Var) > 0) %>%
               arrange(desc(UQ(Var))) %>%
               group_by(UQ(grpVar)) %>%
               summarise(UQ(newVar[1]) := mean(UQ(Var)),
                      UQ(newVar[2]) := median(UQ(Var)),
                      UQ(newVar[3]) := n())

}

df.overZ2(iris, Sepal.Length, Species)
# A tibble: 3 × 4
#    Species Sepal.Length_Mean Sepal.Length_Median Sepal.Length_Rank
#      <fctr>             <dbl>               <dbl>             <int>
#1     setosa             5.006                 5.0                50
#2 versicolor             5.936                 5.9                50
#3  virginica             6.588                 6.5                50

此处,enquo通过获取输入参数并将其转换为substitute,然后在函数内base R,从quosure执行与filter/arrange/mutate/summarise/group_by类似的工作我们取消引用(!!UQ)来评估它。我们还可以通过在作业的{lhs}上传递quosure来命名列(:=