我是R和编码的新手。我一直在尝试解决我正在绘制且碰壁的报告的问题。 在过去的两天里,我一直在努力寻找可行的答案,现在机智已尽。
我有一个学生成绩的数据框。列如下
我正在尝试创建一个摘要/列表,以显示学生活跃的每个模块所通过的百分比。像这样
Year Semester Qualification Module PassRate
2014 Jan Qual1 Subject1 62.54%
2014 Jan Qual1 Subject2 72.81%
.
.
.
2014 July Qual1 Subject1 69.51%
.
.
2014 Jan Qual2 Subject1 42.86%
2014 Jan Qual2 Subject3 55.95%
etc.
我认为也许IF
语句可能有用,但这似乎太麻烦了。我也查看了For each
,但似乎无法弄清楚如何使其工作或上述各项的结合。我已经尝试过aggregate, count =, cbind
和我的好朋友Google可以找到的所有东西。
我有以下代码
AcademicYears <- as.character(unique(unlist(HE_Stats$Year)))
AcademicYears_count <- NROW(AcademicYears)
AcademicSemesters <- as.character(unique(unlist(HE_Stats$ActualSemester)))
AcademicSemesters_count <- NROW(AcademicSemesters)
Qualifications <- as.character(unique(unlist(HE_Stats$Qualification)))
Qualifications_count <- NROW(Qualifications)
Modules <- as.character(unique(unlist(HE_Stats$ModuleCode)))
Modules_count <- NROW(Modules)
df <- HE_Stats %>%
group_by(Year,ActualSemester,Qualification, ModuleCode) %>%
aggregate(cbind(count = AcademicSemesters) ~ AcademicYears,
data = HE_Stats,
FUN = function(AcademicSemesters){NROW(AcademicSemesters)})
其结果是,它每年显示给我一个学期。我最新的计划是逐列构建矩阵。
答案 0 :(得分:0)
如果您可以提供样本数据,将可以为您提供更好的答案。但请说您的数据看起来像(此解决方案使用dplyr软件包:
库(dplyr)
data <- tibble(student_number = c(1, 2, 3, 4, 5, 6),
academic_year = c(2014, 2014, 2014, 2015, 2015, 2015),
semester = c("jan", "jan", "jan","jan", "june", "june"),
qualification = c("qual1", "qual2", "qual1", "qual1", "qual2",
"qual2"),
module = c("subject1", "subject1", "subject1", "subject1",
"subject2", "subject2"),
result = c("passed", "failed", "passed", "passed", "passed",
"failed"))
# A tibble: 6 x 6
student_number academic_year semester qualification module result
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 2014 jan qual1 subject1 passed
2 2 2014 jan qual2 subject1 failed
3 3 2014 jan qual1 subject1 passed
4 4 2015 jan qual1 subject1 passed
5 5 2015 june qual2 subject2 passed
6 6 2015 june qual2 subject2 failed
首先,我将对主题是否通过进行逻辑分析:
data <- data %>%
mutate(pass = ifelse(result == "passed", TRUE, FALSE))
然后汇总分组的数据:
data %>%
group_by(academic_year, semester, qualification, module) %>%
summarise(
pass_rate = (sum(pass)/n())*100
)
产生:
academic_year semester qualification module pass_rate
<dbl> <chr> <chr> <chr> <dbl>
1 2014 jan qual1 subject1 100
2 2014 jan qual2 subject1 0
3 2015 jan qual1 subject1 100
4 2015 june qual2 subject2 50