我刚刚开始学习如何编写自己的函数,并且我正在尝试为特定类型的数据框编写compute_means
函数。这个question似乎相似,但它没有得到答案,我还没有找到任何其他似乎可以解决它的问题。
我的数据看起来像这样:
student <- c("alw", "alw", "bef", "bef")
semester <- c("autumn", "spring", "autumn", "spring" )
test1 <- c(87, 88, 90, 78)
test2 <- c(67, 78, 81, 88)
x <- data.frame(student, semester, test1, test2)
我希望能够做的是编写一个函数,我可以用它来计算方法,按学期分组,或者按学生和学期分组,或者只为一个学生分组。我可以让学生群体工作,但是当我尝试计算单个学生的考试成绩的方法时,我会陷入困境。这是我到目前为止(有问题的部分是else if
部分):
compute_means <- function(df, student = NA, separate = FALSE){
if (!separate & is.na(student)){
df %>%
group_by(semester) %>%
summarise(count = n(), test1 = mean(test1), test2 = mean(test2)) %>%
mutate(students = c("AllStudnts")) %>%
select(students, semester: test2)
}
else if(!separate & !is.na(student)){
df %>%
filter(student == student) %>%
group_by(semester) %>%
summarise(count = n(), test1 = mean(test1), test2 = mean(test2)) %>%
mutate(student = student)
}
else{
df %>%
group_by(student, semester) %>%
summarise(count = n(), test = mean(test1), test2 = mean(test2))
}
}
compute_means(x)
按照我的想法行事:我按学期获得所有学生的平均值。 compute_means(x, separate = TRUE)
也会按照我的想法行事。但是,compute_means(x, student = "alw")
没有按照我的想法行事。如果我没有alw
,我会得到同样的东西,而不是filter().
。我想这样做很容易,但我无法弄清楚它会是什么。
答案 0 :(得分:1)
以下是您的功能的修改版本,它可以为您提供所期望的功能。我将参数student
更改为student_name
。我还删除了跟踪mutate(student = student)
,因为它看起来不需要,我添加了一个管道ungroup
以删除剩余的分组,因为它们可能不需要。
compute_means <- function(df, student_name = NA, separate = FALSE){
if (!separate & is.na(student_name)){
df %>%
group_by(semester) %>%
summarise(count = n(), test1 = mean(test1), test2 = mean(test2)) %>%
mutate(students = c("AllStudnts")) %>%
select(students, semester: test2)
}
else if(!separate & !is.na(student_name)){
df %>%
filter(student == student_name) %>%
group_by(semester) %>%
summarise(count = n(), test1 = mean(test1), test2 = mean(test2))
}
else{
df %>%
group_by(student, semester) %>%
summarise(count = n(), test = mean(test1), test2 = mean(test2)) %>%
ungroup # added since you don't need the remaining grouping.
}
}
从输入x
> x
student semester test1 test2
1 alw autumn 87 67
2 alw spring 88 78
3 bef autumn 90 81
4 bef spring 78 88
以下是使用函数compute_means
> compute_means(x)
Source: local data frame [2 x 5]
students semester count test1 test2
(chr) (fctr) (int) (dbl) (dbl)
1 AllStudnts autumn 2 88.5 74
2 AllStudnts spring 2 83.0 83
> compute_means(x, separate = TRUE)
Source: local data frame [4 x 5]
Groups: student [?]
student semester count test test2
(fctr) (fctr) (int) (dbl) (dbl)
1 alw autumn 1 87 67
2 alw spring 1 88 78
3 bef autumn 1 90 81
4 bef spring 1 78 88
> compute_means(x, student_name = 'alw')
Source: local data frame [2 x 4]
semester count test1 test2
(fctr) (int) (dbl) (dbl)
1 autumn 1 87 67
2 spring 1 88 78
> compute_means(x, student_name = 'bef')
Source: local data frame [2 x 4]
semester count test1 test2
(fctr) (int) (dbl) (dbl)
1 autumn 1 90 81
2 spring 1 78 88
修改强>
filter(student == student)
(在OP的代码中)发生的事情是,在过滤器的上下文中,项student
是对student
中df
的引用在==
的两侧,而不是函数参数。