我正在使用两个数据集。数据集TestA和测试B(以下是如何在R中制作它们)
0
我想合并两个数据集(如果可能,不使用merge())这样,测试A的所有列都填充了TestB提供的信息,并且应该根据类和部分添加它。
我尝试使用合并(TestA,TestB,by = c(' Class',' Section'),all.x = TRUE)但它将观察结果添加到原始TestA。这只是一个测试,但在我使用的数据集中有数百个观察。当我使用这些较小的框架进行操作时,它可以工作,但更大的设置正在发生一些事情。这就是为什么我想知道是否有合并替代方案的原因。
关于如何做到这一点的任何想法?
输出应该如下所示
Instructor <- c('Mr.A','Mr.A','Mr.B', 'Mr.C', 'Mr.D')
Class <- c('French','French','English', 'Math', 'Geometry')
Section <- c('1','2','3','5','5')
Time <- c('9:00-10:00','10:00-11:00','9:00-10:00','9:00-10:00','10:00-11:00')
Date <- c('MWF','MWF','TR','TR','MWF')
Enrollment <- c('30','40','24','29','40')
TestA <- data.frame(Instructor,Class,Section,Time,Date,Enrollment)
rm(Instructor,Class,Section,Time,Date,Enrollment)
Student <- c("Frances","Cass","Fern","Pat","Peter","Kory","Cole")
ID <- c('123','121','101','151','456','789','314')
Instructor <- c('','','','','','','')
Time <- c('','','','','','','')
Date <- c('','','','','','','')
Enrollment <- c('','','','','','','')
Class <- c('French','French','French','French','English', 'Math', 'Geometry')
Section <- c('1','1','2','2','3','5','5')
TestB <- data.frame(Student, ID, Instructor, Class, Section, Time, Date, Enrollment)
rm(Instructor,Class,Section,Time,Date,Enrollment,ID,Student)
答案 0 :(得分:2)
在我了解merge()
dplyr
个join
函数之前,我曾经是library(dplyr)
TestA %>%
left_join(TestB, by = c("Class", "Section")) %>% #Here, you're joining by just the "Class" and "Section" columns of TestA and TestB
select(Class,
Section,
Instructor = Instructor.x,
Time = Time.x,
Date = Date.x,
Enrollment = Enrollment.x,
Student,
ID) %>%
arrange(Class, Section) #Added to match your output.
的忠实粉丝。
请改为尝试:
select
Class Section Instructor Time Date Enrollment Student ID
1 English 3 Mr.B 9:00-10:00 TR 24 Peter 456
2 French 1 Mr.A 9:00-10:00 MWF 30 Frances 123
3 French 1 Mr.A 9:00-10:00 MWF 30 Cass 121
4 French 2 Mr.A 10:00-11:00 MWF 40 Fern 101
5 French 2 Mr.A 10:00-11:00 MWF 40 Pat 151
6 Geometry 5 Mr.D 10:00-11:00 MWF 40 Cole 314
7 Math 5 Mr.C 9:00-10:00 TR 29 Kory 789
语句只保留那些专门命名的列,在某些情况下,重命名它们。
输出:
{{1}}
答案 1 :(得分:2)
关键是在合并/加入之前删除TestB
中空的但重复的列,如SymbolixAU所示。
以下是data.table
语法中的实现:
library(data.table)
setDT(TestB)[, .(Student, ID, Class, Section)][setDT(TestA), on = .(Class, Section)]
Student ID Class Section Instructor Time Date Enrollment
1: Frances 123 French 1 Mr.A 9:00-10:00 MWF 30
2: Cass 121 French 1 Mr.A 9:00-10:00 MWF 30
3: Fern 101 French 2 Mr.A 10:00-11:00 MWF 40
4: Pat 151 French 2 Mr.A 10:00-11:00 MWF 40
5: Peter 456 English 3 Mr.B 9:00-10:00 TR 24
6: Kory 789 Math 5 Mr.C 9:00-10:00 TR 29
7: Cole 314 Geometry 5 Mr.D 10:00-11:00 MWF 40