我有两张学生数据表。第一个表格包含学生在三个单独课程中取得的成绩:
student_id course grade
1 English 6
1 maths 8
1 biology 6
2 English 5
2 maths 7
2 biology 6.5
第二张表格包含学生的平均成绩(三门课程)。
student_id average_grade
1 6.7
2 6.2
我想要一个看起来像这样的新表,包含平均成绩和英语成绩:
student_id average_grade English
1 6.7 6
2 6.2 5
如何获得第三张表?
答案 0 :(得分:3)
library(tidyverse)
df1<-data.frame(studentid = c(1,1,1,2,2,2), course = c('Eng', 'maths', 'bio','Eng' ,'maths', 'bio' ), grade = c(6,8,6,5,7,6.5))
df2<-data.frame(studentid = c(1,2), average_grade = c(6.7,6.2))
inner_join(df1, df2) %>%
spread(course, grade) %>%
select(studentid,average_grade,Eng)
Joining, by = "studentid"
studentid average_grade Eng
1 1 6.7 6
2 2 6.2 5
答案 1 :(得分:2)
也许这样,例如:
library(tidyverse)
d1 <- data.frame(id = c(1,1,2,2), course = c("English", "Math", "English", "Math"), grade = c(6,8,5,7))
d2 <- data.frame(id = c(1,2), avg = c(6.7, 6.2))
merge(d1, d2) %>% filter(course == "English") %>% spread(course, grade)
id avg English
1 1 6.7 6
2 2 6.2 5
答案 2 :(得分:1)
这样做
df1=tibble(id=c(1,1,1,2,2,2),course=c("English","maths","biology","English","maths","biology"),
grade=c(6,8,6,5,7,6.5))
df2=tibble(id=c(1,2),average_grade=c(6.7,6.2))
df0=df1%>%group_by(id,course)%>%summarize(English=mean(grade))%>%filter(course=="English")
merge(df0,df2,by="id")
答案 3 :(得分:1)
在我遇到的所有资源中,我认为这是我看到的合并数据帧的最佳资源之一。
使用合并功能及其可选参数:
内部联接:合并(df1,df2)将适用于这些示例,因为R通过公共变量名称自动加入帧,但您很可能希望指定合并(df1,df2,by =&#34; CustomerId&# 34;)确保您只匹配所需的字段。如果匹配变量在不同的数据框中具有不同的名称,也可以使用by.x和by.y参数。
Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
Cross join: merge(x = df1, y = df2, by = NULL)
How to join (merge) data frames (inner, outer, left, right)?