我必须访问数据表:
第一个表格显示了导师的姓名以及分配给每个导师的所有学生。
mentor student_name
Dr. Brown Michael
Dr. Brown Diana
Dr. Brown Peter
Dr. Brown Christopher
Dr. Brown Stacy
Ms. Lindblom Rose
Ms. Lindblom Anne
Ms. Lindblom Steven
Ms. Lindblom Gloria
Mr. Apple Juan
Mr. Apple Francis
Mr. Apple David
Mr. Apple Sonja
Mr. Apple Dakota
Mr. Apple Latoya
Mr. Apple Avril
Mr. Apple James
Mr. Apple Stewart
Mr. Apple Sophia
第二张表显示了导师与学生之间的一对一辅导课程
mentor date_of_tutoring student_name
Dr. Brown 07/14/2019 Peter
Dr. Brown 07/15/2019 Christopher
Ms. Lindblom 06/28/2019 Gloria
Mr. Apple 06/20/2019 Sophia
Mr. Apple 06/22/2019 Latoya
Mr. Apple 06/25/2019 Juan
Mr. Apple 06/26/2019 Avril
每位导师需要在学年与每个学生进行一次导师辅导。
我想创建一个新表,以显示完成指导任务的百分比。每位导师与所有学生进行一对一的导师辅导会时,将完成任务(100%)。
例如,基于表2的数据以及分配给每个导师的学生数,我想创建一个新的表,如下所示:
teacher %_mentoring_completed
Dr. Brown 40%
Ms. Lindblom 25%
Mr. Apple 40%
答案 0 :(得分:1)
一种选择是将两个数据集按“ mentor”分组进行联接,以获取非NA逻辑向量的mean
library(dplyr)
library(stringr)
left_join(df1, df2) %>%
group_by(mentor) %>%
summarise(PercentageMentoringCompleted = str_c(100 *
mean(!is.na(date_of_tutoring)), "%"))
# A tibble: 3 x 2
# mentor PercentageMentoringCompleted
# <chr> <chr>
#1 Dr. Brown 40%
#2 Mr. Apple 40%
#3 Ms. Lindblom 25%
或另一种选择是使用count
library(purrr)
list(df2, df1) %>% map(~ .x %>%
dplyr::count(mentor)) %>%
reduce(inner_join, by = 'mentor') %>%
transmute(mentor, perc = 100 * n.x/n.y)
或与base R
一起使用merge
和aggregate
aggregate(PercentageMentoringCompleted ~ mentor,
transform(merge(df1, df2, all.x = TRUE),
PercentageMentoringCompleted = !is.na(date_of_tutoring)), mean)
df1 <- structure(list(mentor = c("Dr. Brown", "Dr. Brown", "Dr. Brown",
"Dr. Brown", "Dr. Brown", "Ms. Lindblom", "Ms. Lindblom", "Ms. Lindblom",
"Ms. Lindblom", "Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple",
"Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple",
"Mr. Apple"), student_name = c("Michael", "Diana", "Peter", "Christopher",
"Stacy", "Rose", "Anne", "Steven", "Gloria", "Juan", "Francis",
"David", "Sonja", "Dakota", "Latoya", "Avril", "James", "Stewart",
"Sophia")), class = "data.frame", row.names = c(NA, -19L))
df2 <- structure(list(mentor = c("Dr. Brown", "Dr. Brown", "Ms. Lindblom",
"Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple"),
date_of_tutoring = c("07/14/2019",
"07/15/2019", "06/28/2019", "06/20/2019", "06/22/2019", "06/25/2019",
"06/26/2019"), student_name = c("Peter", "Christopher", "Gloria",
"Sophia", "Latoya", "Juan", "Avril")), class = "data.frame", row.names = c(NA,
-7L))
答案 1 :(得分:0)
我们可以使用table
来计数mentor
的频率,假设两个数据帧中都存在唯一的指导者。
stack(table(df2$mentor)/table(df1$mentor))
# values ind
#1 0.40 Dr. Brown
#2 0.40 Mr. Apple
#3 0.25 Ms. Lindblom
如果它们不是相同的,或者它们的顺序不同,则更安全的选择是将factor
与levels
一起使用,以正确的顺序获取输出
stack(table(factor(df2$mentor, levels = unique(df1$mentor)))/
table(factor(df1$mentor, levels = unique(df1$mentor))))