在函数中使用dplyr :: filter()

时间:2016-01-17 21:26:00

标签: r dplyr

我刚刚开始学习如何编写自己的函数,并且我正在尝试为特定类型的数据框编写compute_means函数。这个question似乎相似,但它没有得到答案,我还没有找到任何其他似乎可以解决它的问题。

我的数据看起来像这样:

student <- c("alw", "alw", "bef", "bef")
semester <- c("autumn", "spring", "autumn", "spring" )
test1 <- c(87, 88, 90, 78)
test2 <- c(67, 78, 81, 88)

x <- data.frame(student, semester, test1, test2)

我希望能够做的是编写一个函数,我可以用它来计算方法,按学期分组,或者按学生和学期分组,或者只为一个学生分组。我可以让学生群体工作,但是当我尝试计算单个学生的考试成绩的方法时,我会陷入困境。这是我到目前为止(有问题的部分是else if部分):

compute_means <- function(df, student = NA, separate = FALSE){
    if (!separate & is.na(student)){
       df %>%
        group_by(semester) %>%
        summarise(count = n(), test1 = mean(test1), test2 = mean(test2)) %>%
        mutate(students = c("AllStudnts")) %>%
        select(students, semester: test2)  
    }
else if(!separate & !is.na(student)){
    df %>%
        filter(student == student) %>%
        group_by(semester) %>%
        summarise(count = n(), test1 = mean(test1), test2 = mean(test2)) %>%
        mutate(student = student)

    }
else{
    df %>%
        group_by(student, semester) %>%
        summarise(count = n(), test = mean(test1), test2 = mean(test2))     
    }
}

compute_means(x)按照我的想法行事:我按学期获得所有学生的平均值。 compute_means(x, separate = TRUE)也会按照我的想法行事。但是,compute_means(x, student = "alw")没有按照我的想法行事。如果我没有alw,我会得到同样的东西,而不是filter().。我想这样做很容易,但我无法弄清楚它会是什么。

1 个答案:

答案 0 :(得分:1)

以下是您的功能的修改版本,它可以为您提供所期望的功能。我将参数student更改为student_name。我还删除了跟踪mutate(student = student),因为它看起来不需要,我添加了一个管道ungroup以删除剩余的分组,因为它们可能不需要。

compute_means <- function(df, student_name = NA, separate = FALSE){
    if (!separate & is.na(student_name)){
       df %>%
        group_by(semester) %>%
        summarise(count = n(), test1 = mean(test1), test2 = mean(test2)) %>%
        mutate(students = c("AllStudnts")) %>%
        select(students, semester: test2)
    }
else if(!separate & !is.na(student_name)){
    df %>%
        filter(student == student_name) %>%
        group_by(semester) %>%
        summarise(count = n(), test1 = mean(test1), test2 = mean(test2))
    }
else{
    df %>%
        group_by(student, semester) %>%
        summarise(count = n(), test = mean(test1), test2 = mean(test2)) %>%
        ungroup # added since you don't need the remaining grouping.
    }
}

从输入x

开始
> x
  student semester test1 test2
1     alw   autumn    87    67
2     alw   spring    88    78
3     bef   autumn    90    81
4     bef   spring    78    88

以下是使用函数compute_means

的各种调用的输出
> compute_means(x)
Source: local data frame [2 x 5]

    students semester count test1 test2
       (chr)   (fctr) (int) (dbl) (dbl)
1 AllStudnts   autumn     2  88.5    74
2 AllStudnts   spring     2  83.0    83
> compute_means(x, separate = TRUE)
Source: local data frame [4 x 5]
Groups: student [?]

  student semester count  test test2
   (fctr)   (fctr) (int) (dbl) (dbl)
1     alw   autumn     1    87    67
2     alw   spring     1    88    78
3     bef   autumn     1    90    81
4     bef   spring     1    78    88
> compute_means(x, student_name = 'alw')
Source: local data frame [2 x 4]

  semester count test1 test2
    (fctr) (int) (dbl) (dbl)
1   autumn     1    87    67
2   spring     1    88    78
> compute_means(x, student_name = 'bef')
Source: local data frame [2 x 4]

  semester count test1 test2
    (fctr) (int) (dbl) (dbl)
1   autumn     1    90    81
2   spring     1    78    88

修改

filter(student == student)(在OP的代码中)发生的事情是,在过滤器的上下文中,项student是对studentdf的引用在==的两侧,而不是函数参数。