R:合并两个表但使用列子集

时间:2018-02-20 11:21:53

标签: r merge subset

我有两张学生数据表。第一个表格包含学生在三个单独课程中取得的成绩:

student_id    course             grade
1             English            6
1             maths              8
1             biology            6
2             English            5
2             maths              7
2             biology            6.5

第二张表格包含学生的平均成绩(三门课程)。

student_id    average_grade
1             6.7
2             6.2

我想要一个看起来像这样的新表,包含平均成绩和英语成绩:

student_id    average_grade     English
1             6.7               6
2             6.2               5

如何获得第三张表?

4 个答案:

答案 0 :(得分:3)

library(tidyverse)
df1<-data.frame(studentid = c(1,1,1,2,2,2), course = c('Eng', 'maths', 'bio','Eng' ,'maths', 'bio' ), grade = c(6,8,6,5,7,6.5))
df2<-data.frame(studentid = c(1,2), average_grade = c(6.7,6.2))



inner_join(df1, df2) %>% 
  spread(course, grade) %>% 
  select(studentid,average_grade,Eng)

Joining, by = "studentid"
  studentid average_grade Eng
1         1           6.7   6
2         2           6.2   5

答案 1 :(得分:2)

也许这样,例如:

library(tidyverse)
d1 <- data.frame(id = c(1,1,2,2), course = c("English", "Math", "English", "Math"), grade = c(6,8,5,7))
d2 <- data.frame(id = c(1,2), avg = c(6.7, 6.2))
merge(d1, d2) %>% filter(course == "English") %>% spread(course, grade)

  id avg English
1  1 6.7       6
2  2 6.2       5

答案 2 :(得分:1)

这样做

df1=tibble(id=c(1,1,1,2,2,2),course=c("English","maths","biology","English","maths","biology"),
           grade=c(6,8,6,5,7,6.5))
df2=tibble(id=c(1,2),average_grade=c(6.7,6.2))
df0=df1%>%group_by(id,course)%>%summarize(English=mean(grade))%>%filter(course=="English")
merge(df0,df2,by="id")

答案 3 :(得分:1)

在我遇到的所有资源中,我认为这是我看到的合并数据帧的最佳资源之一。

使用合并功能及其可选参数:

内部联接:合并(df1,df2)将适用于这些示例,因为R通过公共变量名称自动加入帧,但您很可能希望指定合并(df1,df2,by =&#34; CustomerId&# 34;)确保您只匹配所需的字段。如果匹配变量在不同的数据框中具有不同的名称,也可以使用by.x和by.y参数。

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

How to join (merge) data frames (inner, outer, left, right)?