我想将一个组的最大值分配给该组内的所有行。我该怎么办?
我有一个数据框,其中包含该组的名称和所属组的最大学分数。
course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max)
给出
Course Credits
1 ABC1000 6.5
2 ABC1003 6.5
3 ABC1004 6.5
4 ABC1007 5.0
5 ABC1010 6.5
6 ABC1021 6.5
7 ABC1023 6.5
主数据框如下所示:
Appraisal.Type Resits Credits Course_code Student_ID
Final result 0 6.5 ABC1000 10
Final result 0 6.5 ABC1003 10
Grade supervisor 0 0 ABC1000 10
Grade supervisor 0 0 ABC1003 10
Final result 0 12 ABC1294 23
Grade supervisor 0 0 ABC1294 23
如您所见,学生10修了ABC1000课程,价值6.5学分。但是,对于每门课程(每位学生),都有两行:最终结果和年级主管。最后,应删除最终结果,但应保留功劳。因此,我想将最大值6.5分配给“成绩主管”行。 同样,学生23已修读ABC1294课程,价值12个学分。
最后,应该是结果:
Appraisal.Type Resits Credits Course_code Student_ID
Grade supervisor 0 6.5 ABC1000 10
Grade supervisor 0 6.5 ABC1003 10
Grade supervisor 0 12 ABC1294 23
我该怎么办?
答案 0 :(得分:2)
一种选择是按'Student_ID',mutate
和'{Credits'的max
和filter
的行进行分组,这些行的'Appraisal.Type'为“ Grade主管”
library(dplyr)
df1 %>%
group_by(Student_ID) %>%
dplyr::mutate(Credits = max(Credits)) %>%
ungroup %>%
filter(Appraisal.Type == "Grade supervisor")
# A tibble: 2 x 5
# Appraisal.Type Resits Credits Course_code Student_ID
# <chr> <int> <dbl> <chr> <int>
#1 Grade supervisor 0 6.5 ABC1000 10
#2 Grade supervisor 0 6.5 ABC1003 10
如果我们还需要在分组中包括“课程代码”
df2 %>%
group_by(Student_ID, Course_code) %>%
dplyr::mutate(Credits = max(Credits)) %>%
filter(Appraisal.Type == "Grade supervisor")
# A tibble: 3 x 5
# Groups: Student_ID, Course_code [3]
# Appraisal.Type Resits Credits Course_code Student_ID
# <chr> <int> <dbl> <chr> <int>
#1 Grade supervisor 0 6.5 ABC1000 10
#2 Grade supervisor 0 6.5 ABC1003 10
#3 Grade supervisor 0 12 ABC1294 23
注意:在这种情况下,plyr
程序包也已加载,在功能{es1 summarise/mutate
中可能有一些掩盖,也可以在plyr
中找到。为防止这种情况,请在不加载plyr
的情况下在全新会话中执行此操作,或明确指定dplyr::mutate
df1 <- structure(list(Appraisal.Type = c("Final result", "Final result",
"Grade supervisor", "Grade supervisor"), Resits = c(0L, 0L, 0L,
0L), Credits = c(6.5, 6.5, 0, 0), Course_code = c("ABC1000",
"ABC1003", "ABC1000", "ABC1003"), Student_ID = c(10L, 10L, 10L,
10L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Appraisal.Type = c("Final result", "Final result",
"Grade supervisor", "Grade supervisor", "Final result", "Grade supervisor"
), Resits = c(0L, 0L, 0L, 0L, 0L, 0L), Credits = c(6.5, 6.5,
0, 0, 12, 0), Course_code = c("ABC1000", "ABC1003", "ABC1000",
"ABC1003", "ABC1294", "ABC1294"), Student_ID = c(10L, 10L, 10L,
10L, 23L, 23L)), class = "data.frame", row.names = c(NA, -6L))
答案 1 :(得分:0)
生成样本数据集。
data <- as.data.frame(list(Appraisal.Type = c(rep("Final result", 2), rep("Grade supervisor", 2)),
Resits = rep(0, 4),
Credits = c(rep(6.5, 2), rep(0, 2)),
Course_code = rep(c("ABC1000", "ABC1003"), 2),
Student_ID = rep(10, 4)))
将一个组的最大值分配给该组中的所有行,然后删除包含“最终结果”的行。
##Reassign the values of "Credits" column
for (i in 1: nlevels(as.factor(data$Course_code))) {
Course_code <- unique(data$Course_code)[i]
data$Credits [data$Course_code == Course_code] <- max (data$Credits [data$Course_code == Course_code])
}
##New dataset without "Final result" rows
data <- data[data$Appraisal.Type != "Final result",]
这是结果。
data
Appraisal.Type Resits Credits Course_code Student_ID
3 Grade supervisor 0 6.5 ABC1000 10
4 Grade supervisor 0 6.5 ABC1003 10
答案 2 :(得分:0)
这是一个"w+"
解决方案,
data.table