Question

我正在使用两个数据集。数据集TestA和测试B（以下是如何在R中制作它们）

我想合并两个数据集（如果可能，不使用merge（））这样，测试A的所有列都填充了TestB提供的信息，并且应该根据类和部分添加它。

我尝试使用合并（TestA，TestB，by = c（＆＃39; Class＆＃39;，＆＃39; Section＆＃39;），all.x = TRUE）但它将观察结果添加到原始TestA。这只是一个测试，但在我使用的数据集中有数百个观察。当我使用这些较小的框架进行操作时，它可以工作，但更大的设置正在发生一些事情。这就是为什么我想知道是否有合并替代方案的原因。

关于如何做到这一点的任何想法？

输出应该如下所示

Instructor <- c('Mr.A','Mr.A','Mr.B', 'Mr.C', 'Mr.D')
Class <- c('French','French','English', 'Math', 'Geometry')
Section <- c('1','2','3','5','5')
Time <- c('9:00-10:00','10:00-11:00','9:00-10:00','9:00-10:00','10:00-11:00')
Date <- c('MWF','MWF','TR','TR','MWF')
Enrollment <- c('30','40','24','29','40')

TestA <- data.frame(Instructor,Class,Section,Time,Date,Enrollment)

rm(Instructor,Class,Section,Time,Date,Enrollment)

Student <- c("Frances","Cass","Fern","Pat","Peter","Kory","Cole")
ID <- c('123','121','101','151','456','789','314')
Instructor <- c('','','','','','','')
Time <- c('','','','','','','')
Date <- c('','','','','','','')
Enrollment <- c('','','','','','','')
Class <- c('French','French','French','French','English', 'Math', 'Geometry')
Section <- c('1','1','2','2','3','5','5')


TestB <- data.frame(Student, ID, Instructor, Class, Section, Time, Date, Enrollment)

rm(Instructor,Class,Section,Time,Date,Enrollment,ID,Student)

Answer 1

在我了解merge() dplyr个join函数之前，我曾经是library(dplyr) TestA %>% left_join(TestB, by = c("Class", "Section")) %>% #Here, you're joining by just the "Class" and "Section" columns of TestA and TestB select(Class, Section, Instructor = Instructor.x, Time = Time.x, Date = Date.x, Enrollment = Enrollment.x, Student, ID) %>% arrange(Class, Section) #Added to match your output.的忠实粉丝。

请改为尝试：

select

Class Section Instructor Time Date Enrollment Student ID 1 English 3 Mr.B 9:00-10:00 TR 24 Peter 456 2 French 1 Mr.A 9:00-10:00 MWF 30 Frances 123 3 French 1 Mr.A 9:00-10:00 MWF 30 Cass 121 4 French 2 Mr.A 10:00-11:00 MWF 40 Fern 101 5 French 2 Mr.A 10:00-11:00 MWF 40 Pat 151 6 Geometry 5 Mr.D 10:00-11:00 MWF 40 Cole 314 7 Math 5 Mr.C 9:00-10:00 TR 29 Kory 789语句只保留那些专门命名的列，在某些情况下，重命名它们。

输出：

{{1}}

Answer 2

关键是在合并/加入之前删除TestB 中空的但重复的列，如SymbolixAU所示。

以下是data.table语法中的实现：

library(data.table) setDT(TestB)[, .(Student, ID, Class, Section)][setDT(TestA), on = .(Class, Section)] Student ID Class Section Instructor Time Date Enrollment 1: Frances 123 French 1 Mr.A 9:00-10:00 MWF 30 2: Cass 121 French 1 Mr.A 9:00-10:00 MWF 30 3: Fern 101 French 2 Mr.A 10:00-11:00 MWF 40 4: Pat 151 French 2 Mr.A 10:00-11:00 MWF 40 5: Peter 456 English 3 Mr.B 9:00-10:00 TR 24 6: Kory 789 Math 5 Mr.C 9:00-10:00 TR 29 7: Cole 314 Geometry 5 Mr.D 10:00-11:00 MWF 40

合并两个数据集的最佳方法（可能是函数？）

2 个答案: