有没有办法一次连接两个以上的数据集?

时间:2019-09-10 12:33:38

标签: apache-spark hadoop

我有4个具有不同架构的数据集。 我需要与left-anti一起加入。 我想知道没有一种方法可以一次将所有成员全部加入。

1 个答案:

答案 0 :(得分:1)

这是spark2.4.3嵌套连接。因此,我只是随意介绍了一些实现嵌套连接的想法。

  

第一个DataFrame

 scala>    val someDF = Seq(
("user1", "math","algebra-1","90"),
("user1", "physics","gravity","70"),
("user3", "biology","health","50"),
("user2", "biology","health","100"),
("user1", "math","algebra-1","40"),
("user2", "physics","gravity-2","20")
).toDF("user_id", "course_id","lesson_name","score")

scala> someDF.show
+-------+---------+-----------+-----+
|user_id|course_id|lesson_name|score|
+-------+---------+-----------+-----+
|  user1|     math|  algebra-1|   90|
|  user1|  physics|    gravity|   70|
|  user3|  biology|     health|   50|
|  user2|  biology|     health|  100|
|  user1|     math|  algebra-1|   40|
|  user2|  physics|  gravity-2|   20|
+-------+---------+-----------+-----+
  

第二个DataFrame

scala> var someDF2 = Seq(("math",121),("physics",122),("biology",123)).toDF("sid","rno")
scala> someDF2.show
+-------+---+
|    sid|rno|
+-------+---+
|   math|121|
|physics|122|
|biology|123|
+-------+---+
  

第三数据框

scala> var someDF3 = Seq((121,"G-1"),(122,"G-2"),(123,"G-3")).toDF("rno","grade")

scala> someDF3.show
+---+-----+
|rno|grade|
+---+-----+
|121|  G-1|
|122|  G-2|
|123|  G-3|
+---+-----+

scala> someDF.join(someDF2,col("course_id")===col("sid"),"inner").join(someDF3,Seq("rno"),"inner").show
+---+-------+---------+-----------+-----+-------+-----+                         
|rno|user_id|course_id|lesson_name|score|    sid|grade|
+---+-------+---------+-----------+-----+-------+-----+
|121|  user1|     math|  algebra-1|   90|   math|  G-1|
|122|  user1|  physics|    gravity|   70|physics|  G-2|
|123|  user3|  biology|     health|   50|biology|  G-3|
|123|  user2|  biology|     health|  100|biology|  G-3|
|121|  user1|     math|  algebra-1|   40|   math|  G-1|
|122|  user2|  physics|  gravity-2|   20|physics|  G-2|
+---+-------+---------+-----------+-----+-------+-----+

没有任何数据的含义,但是它将满足您的目的。让我知道您是否有与此相关的任何问题。