新手麻烦为报告构建脚本

时间:2018-12-06 08:44:54

标签: r

我是R和编码的新手。我一直在尝试解决我正在绘制且碰壁的报告的问题。 在过去的两天里,我一直在努力寻找可行的答案,现在机智已尽。

我有一个学生成绩的数据框。列如下

  1. 学生人数
  2. 学年,例如2014、2015等
  3. 一学期,例如一月或六月
  4. 资格,例如Qual1,Qual2等
  5. 模块,例如Subject1,subject2等。这里的问题是subject1可能在Qual1和Qual2中,而suject2可能仅在Qual1中
  6. 结果。这是“通过”或“失败”

我正在尝试创建一个摘要/列表,以显示学生活跃的每个模块所通过的百分比。像这样

Year Semester Qualification Module    PassRate
2014 Jan      Qual1         Subject1  62.54%
2014 Jan      Qual1         Subject2  72.81%
.
.
.
2014 July     Qual1         Subject1  69.51%
.
.
2014 Jan      Qual2         Subject1  42.86%
2014 Jan      Qual2         Subject3  55.95%
etc.

我认为也许IF语句可能有用,但这似乎太麻烦了。我也查看了For each,但似乎无法弄清楚如何使其工作或上述各项的结合。我已经尝试过aggregate, count =, cbind和我的好朋友Google可以找到的所有东西。

我有以下代码

AcademicYears <- as.character(unique(unlist(HE_Stats$Year)))
AcademicYears_count <- NROW(AcademicYears)

AcademicSemesters <- as.character(unique(unlist(HE_Stats$ActualSemester)))
AcademicSemesters_count <- NROW(AcademicSemesters)

Qualifications <- as.character(unique(unlist(HE_Stats$Qualification)))
Qualifications_count <- NROW(Qualifications)

Modules <- as.character(unique(unlist(HE_Stats$ModuleCode)))
Modules_count <- NROW(Modules)

df <- HE_Stats %>% 
group_by(Year,ActualSemester,Qualification, ModuleCode) %>%

aggregate(cbind(count = AcademicSemesters) ~ AcademicYears,
data = HE_Stats,
FUN = function(AcademicSemesters){NROW(AcademicSemesters)})

其结果是,它每年显示给我一个学期。我最新的计划是逐列构建矩阵。

1 个答案:

答案 0 :(得分:0)

如果您可以提供样本数据,将可以为您提供更好的答案。但请说您的数据看起来像(此解决方案使用dplyr软件包:

库(dplyr)

data <- tibble(student_number = c(1, 2, 3, 4, 5, 6),
           academic_year = c(2014, 2014, 2014, 2015, 2015, 2015),
           semester = c("jan", "jan", "jan","jan", "june", "june"),
           qualification = c("qual1", "qual2", "qual1", "qual1", "qual2",
                             "qual2"),
           module = c("subject1", "subject1", "subject1", "subject1", 
                       "subject2", "subject2"),
           result = c("passed", "failed", "passed", "passed", "passed", 
                      "failed"))

# A tibble: 6 x 6
  student_number academic_year semester qualification module   result  
           <dbl>         <dbl> <chr>    <chr>         <chr>    <chr> 
1              1          2014 jan      qual1         subject1 passed
2              2          2014 jan      qual2         subject1 failed
3              3          2014 jan      qual1         subject1 passed
4              4          2015 jan      qual1         subject1 passed
5              5          2015 june     qual2         subject2 passed
6              6          2015 june     qual2         subject2 failed

首先,我将对主题是否通过进行逻辑分析:

data <- data %>%
  mutate(pass = ifelse(result == "passed", TRUE, FALSE))

然后汇总分组的数据:

data %>%
  group_by(academic_year, semester, qualification, module) %>%
  summarise(
    pass_rate = (sum(pass)/n())*100
  )

产生:

  academic_year semester qualification module   pass_rate
          <dbl> <chr>    <chr>         <chr>        <dbl>
1          2014 jan      qual1         subject1       100
2          2014 jan      qual2         subject1         0
3          2015 jan      qual1         subject1       100
4          2015 june     qual2         subject2        50