在密钥上加入Spark数据帧

时间:2016-10-31 13:54:41

标签: scala apache-spark dataframe apache-spark-sql

我构建了两个数据帧。我们如何加入多个Spark数据帧?

例如:

PersonDfProfileDf的公共列为personId,为(键)。     现在,我们如何才能将PersonDfProfileDf合并为一个数据帧?

8 个答案:

答案 0 :(得分:41)

使用scala(this is example given for older version of spark for spark 2.x see my other answer)的别名方法:

您可以使用案例类来准备样本数据集...... 这是ex的可选项:您也可以从DataFrame获得hiveContext.sql

import org.apache.spark.sql.functions.col

case class Person(name: String, age: Int, personid : Int)

case class Profile(name: String, personid  : Int , profileDescription: String)

    val df1 = sqlContext.createDataFrame(
   Person("Bindu",20,  2) 
:: Person("Raphel",25, 5) 
:: Person("Ram",40, 9):: Nil)


val df2 = sqlContext.createDataFrame(
Profile("Spark",2,  "SparkSQLMaster") 
:: Profile("Spark",5, "SparkGuru") 
:: Profile("Spark",9, "DevHunter"):: Nil
)

// you can do alias to refer column name with aliases to  increase readablity

val df_asPerson = df1.as("dfperson")
val df_asProfile = df2.as("dfprofile")


val joined_df = df_asPerson.join(
    df_asProfile
, col("dfperson.personid") === col("dfprofile.personid")
, "inner")


joined_df.select(
  col("dfperson.name")
, col("dfperson.age")
, col("dfprofile.name")
, col("dfprofile.profileDescription"))
.show

样本临时表方法我个人不喜欢...

The reason to use the registerTempTable( tableName ) method for a DataFrame, is so that in addition to being able to use the Spark-provided methods of a DataFrame, you can also issue SQL queries via the sqlContext.sql( sqlQuery ) method, that use that DataFrame as an SQL table. The tableName parameter specifies the table name to use for that DataFrame in the SQL queries.

df_asPerson.registerTempTable("dfperson");
df_asProfile.registerTempTable("dfprofile")

sqlContext.sql("""SELECT dfperson.name, dfperson.age, dfprofile.profileDescription
                  FROM  dfperson JOIN  dfprofile
                  ON dfperson.personid == dfprofile.personid""")

如果你想了解更多关于联接的信息请参阅这篇文章:beyond-traditional-join-with-apache-spark

enter image description here

  

注意:   1)如 @RaphaelRoth 所述,

     

val resultDf = PersonDf.join(ProfileDf,Seq("personId"))很好   因为如果你使用同一个表的内连接,它不会有来自双方的重复列   2)Spark 2.x示例在另一个带有完整连接集的答案中更新   spark 2.x支持的操作,带有示例+结果

提示:

此外,连接中的重要事项:broadcast function can help to give hint please see my answer

答案 1 :(得分:16)

你可以使用

val resultDf = PersonDf.join(ProfileDf, PersonDf("personId") === ProfileDf("personId"))

或更短且更灵活(因为您可以轻松指定多于1列的连接)

val resultDf = PersonDf.join(ProfileDf,Seq("personId"))

答案 2 :(得分:3)

除了上面的答案外,我还尝试使用spark 2.x here is my linked in article with full examples and explanation 演示具有相同案例类的所有spark连接。

所有联接类型: 默认值为inner。必须是以下之一:                      innercrossouterfullfull_outerleftleft_outerright,{{ 1}},right_outerleft_semi

left_anti

结果:

First example inner join  
+---------------+---+-----------+------------------+
|           name|age|profileName|profileDescription|
+---------------+---+-----------+------------------+
|        Nataraj| 45|      Spark|    SparkSQLMaster|
|       Srinivas| 45|      Spark|         SparkGuru|
|          Ashik| 22|      Spark|         DevHunter|
|          Madhu| 22|      Spark|        Evangelist|
|         Meghna| 22|      Spark|    SparkSQLMaster|
|        Snigdha| 22|      Spark|    SparkSQLMaster|
|           Ravi| 42|      Spark|         Committer|
|            Ram| 42|      Spark|         DevHunter|
|Chidananda Raju| 35|      Spark|         DevHunter|
|Sreekanth Doddy| 29|      Spark|         DevHunter|
+---------------+---+-----------+------------------+

all joins in a loop
INNER JOIN
+--------+---------------+---+-----------+------------------+
|personid|           name|age|profileName|profileDescription|
+--------+---------------+---+-----------+------------------+
|       0|           Ravi| 42|      Spark|         Committer|
|       2|        Snigdha| 22|      Spark|    SparkSQLMaster|
|       2|         Meghna| 22|      Spark|    SparkSQLMaster|
|       2|        Nataraj| 45|      Spark|    SparkSQLMaster|
|       3|          Madhu| 22|      Spark|        Evangelist|
|       5|       Srinivas| 45|      Spark|         SparkGuru|
|       9|            Ram| 42|      Spark|         DevHunter|
|       9|          Ashik| 22|      Spark|         DevHunter|
|       9|Chidananda Raju| 35|      Spark|         DevHunter|
|       9|Sreekanth Doddy| 29|      Spark|         DevHunter|
+--------+---------------+---+-----------+------------------+

OUTER JOIN
+--------+---------------+----+-----------+------------------+
|personid|           name| age|profileName|profileDescription|
+--------+---------------+----+-----------+------------------+
|       0|           Ravi|  42|      Spark|         Committer|
|       1|           null|null|      Spark|       All Rounder|
|       2|        Nataraj|  45|      Spark|    SparkSQLMaster|
|       2|        Snigdha|  22|      Spark|    SparkSQLMaster|
|       2|         Meghna|  22|      Spark|    SparkSQLMaster|
|       3|          Madhu|  22|      Spark|        Evangelist|
|       4|       Siddhika|  22|       null|              null|
|       5|       Srinivas|  45|      Spark|         SparkGuru|
|       6|       Harshita|  22|       null|              null|
|       8|      Deekshita|  22|       null|              null|
|       9|          Ashik|  22|      Spark|         DevHunter|
|       9|            Ram|  42|      Spark|         DevHunter|
|       9|Chidananda Raju|  35|      Spark|         DevHunter|
|       9|Sreekanth Doddy|  29|      Spark|         DevHunter|
+--------+---------------+----+-----------+------------------+

FULL JOIN
+--------+---------------+----+-----------+------------------+
|personid|           name| age|profileName|profileDescription|
+--------+---------------+----+-----------+------------------+
|       0|           Ravi|  42|      Spark|         Committer|
|       1|           null|null|      Spark|       All Rounder|
|       2|        Nataraj|  45|      Spark|    SparkSQLMaster|
|       2|         Meghna|  22|      Spark|    SparkSQLMaster|
|       2|        Snigdha|  22|      Spark|    SparkSQLMaster|
|       3|          Madhu|  22|      Spark|        Evangelist|
|       4|       Siddhika|  22|       null|              null|
|       5|       Srinivas|  45|      Spark|         SparkGuru|
|       6|       Harshita|  22|       null|              null|
|       8|      Deekshita|  22|       null|              null|
|       9|          Ashik|  22|      Spark|         DevHunter|
|       9|            Ram|  42|      Spark|         DevHunter|
|       9|Sreekanth Doddy|  29|      Spark|         DevHunter|
|       9|Chidananda Raju|  35|      Spark|         DevHunter|
+--------+---------------+----+-----------+------------------+

FULL_OUTER JOIN
+--------+---------------+----+-----------+------------------+
|personid|           name| age|profileName|profileDescription|
+--------+---------------+----+-----------+------------------+
|       0|           Ravi|  42|      Spark|         Committer|
|       1|           null|null|      Spark|       All Rounder|
|       2|        Nataraj|  45|      Spark|    SparkSQLMaster|
|       2|         Meghna|  22|      Spark|    SparkSQLMaster|
|       2|        Snigdha|  22|      Spark|    SparkSQLMaster|
|       3|          Madhu|  22|      Spark|        Evangelist|
|       4|       Siddhika|  22|       null|              null|
|       5|       Srinivas|  45|      Spark|         SparkGuru|
|       6|       Harshita|  22|       null|              null|
|       8|      Deekshita|  22|       null|              null|
|       9|          Ashik|  22|      Spark|         DevHunter|
|       9|            Ram|  42|      Spark|         DevHunter|
|       9|Chidananda Raju|  35|      Spark|         DevHunter|
|       9|Sreekanth Doddy|  29|      Spark|         DevHunter|
+--------+---------------+----+-----------+------------------+

LEFT JOIN
+--------+---------------+---+-----------+------------------+
|personid|           name|age|profileName|profileDescription|
+--------+---------------+---+-----------+------------------+
|       0|           Ravi| 42|      Spark|         Committer|
|       2|        Snigdha| 22|      Spark|    SparkSQLMaster|
|       2|         Meghna| 22|      Spark|    SparkSQLMaster|
|       2|        Nataraj| 45|      Spark|    SparkSQLMaster|
|       3|          Madhu| 22|      Spark|        Evangelist|
|       4|       Siddhika| 22|       null|              null|
|       5|       Srinivas| 45|      Spark|         SparkGuru|
|       6|       Harshita| 22|       null|              null|
|       8|      Deekshita| 22|       null|              null|
|       9|            Ram| 42|      Spark|         DevHunter|
|       9|          Ashik| 22|      Spark|         DevHunter|
|       9|Chidananda Raju| 35|      Spark|         DevHunter|
|       9|Sreekanth Doddy| 29|      Spark|         DevHunter|
+--------+---------------+---+-----------+------------------+

LEFT_OUTER JOIN
+--------+---------------+---+-----------+------------------+
|personid|           name|age|profileName|profileDescription|
+--------+---------------+---+-----------+------------------+
|       0|           Ravi| 42|      Spark|         Committer|
|       2|        Nataraj| 45|      Spark|    SparkSQLMaster|
|       2|         Meghna| 22|      Spark|    SparkSQLMaster|
|       2|        Snigdha| 22|      Spark|    SparkSQLMaster|
|       3|          Madhu| 22|      Spark|        Evangelist|
|       4|       Siddhika| 22|       null|              null|
|       5|       Srinivas| 45|      Spark|         SparkGuru|
|       6|       Harshita| 22|       null|              null|
|       8|      Deekshita| 22|       null|              null|
|       9|Chidananda Raju| 35|      Spark|         DevHunter|
|       9|Sreekanth Doddy| 29|      Spark|         DevHunter|
|       9|          Ashik| 22|      Spark|         DevHunter|
|       9|            Ram| 42|      Spark|         DevHunter|
+--------+---------------+---+-----------+------------------+

RIGHT JOIN
+--------+---------------+----+-----------+------------------+
|personid|           name| age|profileName|profileDescription|
+--------+---------------+----+-----------+------------------+
|       0|           Ravi|  42|      Spark|         Committer|
|       1|           null|null|      Spark|       All Rounder|
|       2|        Snigdha|  22|      Spark|    SparkSQLMaster|
|       2|         Meghna|  22|      Spark|    SparkSQLMaster|
|       2|        Nataraj|  45|      Spark|    SparkSQLMaster|
|       3|          Madhu|  22|      Spark|        Evangelist|
|       5|       Srinivas|  45|      Spark|         SparkGuru|
|       9|Sreekanth Doddy|  29|      Spark|         DevHunter|
|       9|Chidananda Raju|  35|      Spark|         DevHunter|
|       9|            Ram|  42|      Spark|         DevHunter|
|       9|          Ashik|  22|      Spark|         DevHunter|
+--------+---------------+----+-----------+------------------+

RIGHT_OUTER JOIN
+--------+---------------+----+-----------+------------------+
|personid|           name| age|profileName|profileDescription|
+--------+---------------+----+-----------+------------------+
|       0|           Ravi|  42|      Spark|         Committer|
|       1|           null|null|      Spark|       All Rounder|
|       2|         Meghna|  22|      Spark|    SparkSQLMaster|
|       2|        Snigdha|  22|      Spark|    SparkSQLMaster|
|       2|        Nataraj|  45|      Spark|    SparkSQLMaster|
|       3|          Madhu|  22|      Spark|        Evangelist|
|       5|       Srinivas|  45|      Spark|         SparkGuru|
|       9|Sreekanth Doddy|  29|      Spark|         DevHunter|
|       9|          Ashik|  22|      Spark|         DevHunter|
|       9|Chidananda Raju|  35|      Spark|         DevHunter|
|       9|            Ram|  42|      Spark|         DevHunter|
+--------+---------------+----+-----------+------------------+

LEFT_SEMI JOIN
+--------+---------------+---+
|personid|           name|age|
+--------+---------------+---+
|       0|           Ravi| 42|
|       2|        Nataraj| 45|
|       2|         Meghna| 22|
|       2|        Snigdha| 22|
|       3|          Madhu| 22|
|       5|       Srinivas| 45|
|       9|Chidananda Raju| 35|
|       9|Sreekanth Doddy| 29|
|       9|            Ram| 42|
|       9|          Ashik| 22|
+--------+---------------+---+

LEFT_ANTI JOIN
+--------+---------+---+
|personid|     name|age|
+--------+---------+---+
|       4| Siddhika| 22|
|       6| Harshita| 22|
|       8|Deekshita| 22|
+--------+---------+---+


Till 1.x  cross join is :  df_asPerson.join(df_asProfile)

 Explicit Cross Join in 2.x :
 http://blog.madhukaraphatak.com/migrating-to-spark-two-part-4/
 Cartesian joins are very expensive without an extra filter that can be pushed down.

 cross join or cartesian product



+---------------+---+--------+-----------+--------+------------------+
|name           |age|personid|profileName|personid|profileDescription|
+---------------+---+--------+-----------+--------+------------------+
|Nataraj        |45 |2       |Spark      |2       |SparkSQLMaster    |
|Nataraj        |45 |2       |Spark      |5       |SparkGuru         |
|Nataraj        |45 |2       |Spark      |9       |DevHunter         |
|Nataraj        |45 |2       |Spark      |3       |Evangelist        |
|Nataraj        |45 |2       |Spark      |0       |Committer         |
|Nataraj        |45 |2       |Spark      |1       |All Rounder       |
|Srinivas       |45 |5       |Spark      |2       |SparkSQLMaster    |
|Srinivas       |45 |5       |Spark      |5       |SparkGuru         |
|Srinivas       |45 |5       |Spark      |9       |DevHunter         |
|Srinivas       |45 |5       |Spark      |3       |Evangelist        |
|Srinivas       |45 |5       |Spark      |0       |Committer         |
|Srinivas       |45 |5       |Spark      |1       |All Rounder       |
|Ashik          |22 |9       |Spark      |2       |SparkSQLMaster    |
|Ashik          |22 |9       |Spark      |5       |SparkGuru         |
|Ashik          |22 |9       |Spark      |9       |DevHunter         |
|Ashik          |22 |9       |Spark      |3       |Evangelist        |
|Ashik          |22 |9       |Spark      |0       |Committer         |
|Ashik          |22 |9       |Spark      |1       |All Rounder       |
|Deekshita      |22 |8       |Spark      |2       |SparkSQLMaster    |
|Deekshita      |22 |8       |Spark      |5       |SparkGuru         |
|Deekshita      |22 |8       |Spark      |9       |DevHunter         |
|Deekshita      |22 |8       |Spark      |3       |Evangelist        |
|Deekshita      |22 |8       |Spark      |0       |Committer         |
|Deekshita      |22 |8       |Spark      |1       |All Rounder       |
|Siddhika       |22 |4       |Spark      |2       |SparkSQLMaster    |
|Siddhika       |22 |4       |Spark      |5       |SparkGuru         |
|Siddhika       |22 |4       |Spark      |9       |DevHunter         |
|Siddhika       |22 |4       |Spark      |3       |Evangelist        |
|Siddhika       |22 |4       |Spark      |0       |Committer         |
|Siddhika       |22 |4       |Spark      |1       |All Rounder       |
|Madhu          |22 |3       |Spark      |2       |SparkSQLMaster    |
|Madhu          |22 |3       |Spark      |5       |SparkGuru         |
|Madhu          |22 |3       |Spark      |9       |DevHunter         |
|Madhu          |22 |3       |Spark      |3       |Evangelist        |
|Madhu          |22 |3       |Spark      |0       |Committer         |
|Madhu          |22 |3       |Spark      |1       |All Rounder       |
|Meghna         |22 |2       |Spark      |2       |SparkSQLMaster    |
|Meghna         |22 |2       |Spark      |5       |SparkGuru         |
|Meghna         |22 |2       |Spark      |9       |DevHunter         |
|Meghna         |22 |2       |Spark      |3       |Evangelist        |
|Meghna         |22 |2       |Spark      |0       |Committer         |
|Meghna         |22 |2       |Spark      |1       |All Rounder       |
|Snigdha        |22 |2       |Spark      |2       |SparkSQLMaster    |
|Snigdha        |22 |2       |Spark      |5       |SparkGuru         |
|Snigdha        |22 |2       |Spark      |9       |DevHunter         |
|Snigdha        |22 |2       |Spark      |3       |Evangelist        |
|Snigdha        |22 |2       |Spark      |0       |Committer         |
|Snigdha        |22 |2       |Spark      |1       |All Rounder       |
|Harshita       |22 |6       |Spark      |2       |SparkSQLMaster    |
|Harshita       |22 |6       |Spark      |5       |SparkGuru         |
|Harshita       |22 |6       |Spark      |9       |DevHunter         |
|Harshita       |22 |6       |Spark      |3       |Evangelist        |
|Harshita       |22 |6       |Spark      |0       |Committer         |
|Harshita       |22 |6       |Spark      |1       |All Rounder       |
|Ravi           |42 |0       |Spark      |2       |SparkSQLMaster    |
|Ravi           |42 |0       |Spark      |5       |SparkGuru         |
|Ravi           |42 |0       |Spark      |9       |DevHunter         |
|Ravi           |42 |0       |Spark      |3       |Evangelist        |
|Ravi           |42 |0       |Spark      |0       |Committer         |
|Ravi           |42 |0       |Spark      |1       |All Rounder       |
|Ram            |42 |9       |Spark      |2       |SparkSQLMaster    |
|Ram            |42 |9       |Spark      |5       |SparkGuru         |
|Ram            |42 |9       |Spark      |9       |DevHunter         |
|Ram            |42 |9       |Spark      |3       |Evangelist        |
|Ram            |42 |9       |Spark      |0       |Committer         |
|Ram            |42 |9       |Spark      |1       |All Rounder       |
|Chidananda Raju|35 |9       |Spark      |2       |SparkSQLMaster    |
|Chidananda Raju|35 |9       |Spark      |5       |SparkGuru         |
|Chidananda Raju|35 |9       |Spark      |9       |DevHunter         |
|Chidananda Raju|35 |9       |Spark      |3       |Evangelist        |
|Chidananda Raju|35 |9       |Spark      |0       |Committer         |
|Chidananda Raju|35 |9       |Spark      |1       |All Rounder       |
|Sreekanth Doddy|29 |9       |Spark      |2       |SparkSQLMaster    |
|Sreekanth Doddy|29 |9       |Spark      |5       |SparkGuru         |
|Sreekanth Doddy|29 |9       |Spark      |9       |DevHunter         |
|Sreekanth Doddy|29 |9       |Spark      |3       |Evangelist        |
|Sreekanth Doddy|29 |9       |Spark      |0       |Committer         |
|Sreekanth Doddy|29 |9       |Spark      |1       |All Rounder       |
+---------------+---+--------+-----------+--------+------------------+

== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, Cross
:- LocalTableScan [name#0, age#1, personid#2]
+- BroadcastExchange IdentityBroadcastMode
   +- LocalTableScan [profileName#7, personid#8, profileDescription#9]
()
78
createOrReplaceTempView example 

Creates a local temporary view using the given name. The lifetime of this
   temporary view is tied to the [[SparkSession]] that was used to create this Dataset.

createOrReplaceTempView  sql 
SELECT dfperson.name
, dfperson.age
, dfprofile.profileDescription
  FROM  dfperson JOIN  dfprofile
 ON dfperson.personid == dfprofile.personid

+---------------+---+------------------+
|           name|age|profileDescription|
+---------------+---+------------------+
|        Nataraj| 45|    SparkSQLMaster|
|       Srinivas| 45|         SparkGuru|
|          Ashik| 22|         DevHunter|
|          Madhu| 22|        Evangelist|
|         Meghna| 22|    SparkSQLMaster|
|        Snigdha| 22|    SparkSQLMaster|
|           Ravi| 42|         Committer|
|            Ram| 42|         DevHunter|
|Chidananda Raju| 35|         DevHunter|
|Sreekanth Doddy| 29|         DevHunter|
+---------------+---+------------------+



**** EXCEPT DEMO ***


 df_asPerson.except(df_asProfile) Except demo
+---------------+---+--------+
|           name|age|personid|
+---------------+---+--------+
|          Ashik| 22|       9|
|       Harshita| 22|       6|
|          Madhu| 22|       3|
|            Ram| 42|       9|
|           Ravi| 42|       0|
|Chidananda Raju| 35|       9|
|       Siddhika| 22|       4|
|       Srinivas| 45|       5|
|Sreekanth Doddy| 29|       9|
|      Deekshita| 22|       8|
|         Meghna| 22|       2|
|        Snigdha| 22|       2|
|        Nataraj| 45|       2|
+---------------+---+--------+

 df_asProfile.except(df_asPerson) Except demo
+-----------+--------+------------------+
|profileName|personid|profileDescription|
+-----------+--------+------------------+
|      Spark|       5|         SparkGuru|
|      Spark|       9|         DevHunter|
|      Spark|       2|    SparkSQLMaster|
|      Spark|       3|        Evangelist|
|      Spark|       0|         Committer|
|      Spark|       1|       All Rounder|
+-----------+--------+------------------+

如上所述,这些是所有联接的维恩图。 enter image description here

答案 3 :(得分:1)

https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html开始,使用join

  

使用给定列与另一个DataFrame进行内部equi-join。

PersonDf.join(ProfileDf,$"personId")

OR

PersonDf.join(ProfileDf,PersonDf("personId") === ProfileDf("personId"))

<强>更新

您还可以使用DFsdf.registerTempTable("tableName")保存为临时表,并且可以使用sqlContext编写SQL查询。

答案 4 :(得分:0)

一种方式

// join type can be inner, left, right, fullouter
val mergedDf = df1.join(df2, Seq("keyCol"), "inner")
// keyCol can be multiple column names seperated by comma
val mergedDf = df1.join(df2, Seq("keyCol1", "keyCol2"), "left")

另一种方式

import spark.implicits._ 
val mergedDf = df1.as("d1").join(df2.as("d2"), ($"d1.colName" === $"d2.colName"))
// to select specific columns as output
val mergedDf = df1.as("d1").join(df2.as("d2"), ($"d1.colName" === $"d2.colName")).select($"d1.*", $"d2.anotherColName")

答案 5 :(得分:0)

与scala的内部加入

val joinedDataFrame = PersonDf.join(ProfileDf ,"personId")
joinedDataFrame.show

答案 6 :(得分:0)

发布基于Java的解决方案,以防您的团队仅使用Java。关键字inner将确保最终数据帧中仅存在匹配的行。

            Dataset<Row> joined = PersonDf.join(ProfileDf, 
                    PersonDf.col("personId").equalTo(ProfileDf.col("personId")),
                    "inner");
            joined.show();

答案 7 :(得分:0)

让我举例说明

  1. 创建emp数据框

    导入spark.sqlContext.implicits._ val emp = Seq((1,“ Smith”,-1,“ 2018”,“ 10”,“ M”,3000), (2,“ Rose”,1,“ 2010”,“ 20”,“ M”,4000), (3,“威廉姆斯”,1,“ 2010”,“ 10”,“ M”,1000), (4,“ Jones”,2,“ 2005”,“ 10”,“ F”,2000), (5,“棕色”,2,“ 2010”,“ 40”,“”,-1), (6,“棕色”,2,“ 2010”,“ 50”,“”,-1) ) val empColumns = Seq(“ emp_id”,“ name”,“ superior_emp_id”,“ year_joined”, “ emp_dept_id”,“性别”,“工资”)

    val empDF = emp.toDF(empColumns:_ *)

  2. 创建部门DataFrame

    val dept = Seq((“ Finance”,10), (“营销”,20), (“销售”,30), (“ IT”,40) )

    val deptColumns = Seq(“ dept_name”,“ dept_id”) val deptDF = dept.toDF(deptColumns:_ *)

现在让我们将emp.emp_dept_id与dept.dept_id结合起来

empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"inner")
    .show(false)

以下结果

+------+--------+---------------+-----------+-----------+------+------+---------+-------+
|emp_id|name    |superior_emp_id|year_joined|emp_dept_id|gender|salary|dept_name|dept_id|
+------+--------+---------------+-----------+-----------+------+------+---------+-------+
|1     |Smith   |-1             |2018       |10         |M     |3000  |Finance  |10     |
|2     |Rose    |1              |2010       |20         |M     |4000  |Marketing|20     |
|3     |Williams|1              |2010       |10         |M     |1000  |Finance  |10     |
|4     |Jones   |2              |2005       |10         |F     |2000  |Finance  |10     |
|5     |Brown   |2              |2010       |40         |      |-1    |IT       |40     |
+------+--------+---------------+-----------+-----------+------+------+---------+-------+

如果您正在使用示例查找python PySpark Join,并在Spark Join上找到完整的Scala示例