我有4个具有不同架构的数据集。
我需要与left-anti
一起加入。
我想知道没有一种方法可以一次将所有成员全部加入。
答案 0 :(得分:1)
这是spark2.4.3嵌套连接。因此,我只是随意介绍了一些实现嵌套连接的想法。
第一个DataFrame
scala> val someDF = Seq(
("user1", "math","algebra-1","90"),
("user1", "physics","gravity","70"),
("user3", "biology","health","50"),
("user2", "biology","health","100"),
("user1", "math","algebra-1","40"),
("user2", "physics","gravity-2","20")
).toDF("user_id", "course_id","lesson_name","score")
scala> someDF.show
+-------+---------+-----------+-----+
|user_id|course_id|lesson_name|score|
+-------+---------+-----------+-----+
| user1| math| algebra-1| 90|
| user1| physics| gravity| 70|
| user3| biology| health| 50|
| user2| biology| health| 100|
| user1| math| algebra-1| 40|
| user2| physics| gravity-2| 20|
+-------+---------+-----------+-----+
第二个DataFrame
scala> var someDF2 = Seq(("math",121),("physics",122),("biology",123)).toDF("sid","rno")
scala> someDF2.show
+-------+---+
| sid|rno|
+-------+---+
| math|121|
|physics|122|
|biology|123|
+-------+---+
第三数据框
scala> var someDF3 = Seq((121,"G-1"),(122,"G-2"),(123,"G-3")).toDF("rno","grade")
scala> someDF3.show
+---+-----+
|rno|grade|
+---+-----+
|121| G-1|
|122| G-2|
|123| G-3|
+---+-----+
scala> someDF.join(someDF2,col("course_id")===col("sid"),"inner").join(someDF3,Seq("rno"),"inner").show
+---+-------+---------+-----------+-----+-------+-----+
|rno|user_id|course_id|lesson_name|score| sid|grade|
+---+-------+---------+-----------+-----+-------+-----+
|121| user1| math| algebra-1| 90| math| G-1|
|122| user1| physics| gravity| 70|physics| G-2|
|123| user3| biology| health| 50|biology| G-3|
|123| user2| biology| health| 100|biology| G-3|
|121| user1| math| algebra-1| 40| math| G-1|
|122| user2| physics| gravity-2| 20|physics| G-2|
+---+-------+---------+-----------+-----+-------+-----+
没有任何数据的含义,但是它将满足您的目的。让我知道您是否有与此相关的任何问题。