多个左联接无法在Spark 2.0(Scala)中按预期方式工作

时间:2019-10-14 18:52:41

标签: scala apache-spark apache-spark-sql

我有一个来自数据库的数据框

val listvaluesDF = spark.sqlContext.read.format("jdbc").option("driver" , "com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", "jdbc:sqlserver://azure.cloud.acme.com:14481;databaseName=dbadmin3").option("dbtable", "(select distinct [key], value, internal from dbadmin3.V_LIST_VALUES where internal in ('year', 'wmt0SBU', 'wmt0Department', 'wmt0DeptCategory', 'wmt0DotcomOnly', 'wmt0WalmartWeek', 'wmt0SetWeek', 'wmt0Event', 'wmt1Qtr','seasonType')) tmp").option("user", "aaa").option("password", "xxx").load()

拆分为多个数据框

listvaluesDF.createOrReplaceTempView("listvaluesDF")

var dfYear = spark.sql("select key, value from listvaluesDF where internal = 'year'")
var dfSBU = spark.sql("select key, value from listvaluesDF where internal = 'wmt0SBU'")
var dfDept = spark.sql("select key, value from listvaluesDF where internal = 'wmt0Department'")
var dfDeptCategory = spark.sql("select key, value from listvaluesDF where internal = 'wmt0DeptCategory'")
var dfDotcom = spark.sql("select key, value from listvaluesDF where internal = 'wmt0DotcomOnly'")
var dfWalmartWeek = spark.sql("select key, value from listvaluesDF where internal = 'wmt0WalmartWeek'")
var dfSetWeek = spark.sql("select key, value from listvaluesDF where internal = 'wmt0SetWeek'")
var dfEvent = spark.sql("select key, value from listvaluesDF where internal = 'wmt0Event'")
var dfQtr = spark.sql("select key, value from listvaluesDF where internal = 'wmt1Qtr'")
var dfseasonType = spark.sql("select key, value from listvaluesDF where internal = 'seasonType'")

并像

一样与Main DF进行多个左连接
val seasonFinalDF = seasonsDF.alias("seasonsDF").join(paletteDF.alias("primaryPalette"), seasonsDF("primaryPalette") === paletteDF("id"), "left_outer").join(flextypeDF.alias("SBU"), seasonsDF("hierarchy") === flextypeDF("key"), "left_outer").join(dfYear.alias("fiscalYearEnding"), seasonsDF("fiscalYearEnding") === dfYear("key"), "left_outer").join(dfSBU.alias("SBU"), seasonsDF("SBU") === dfSBU("key"), "left_outer").join(dfDept.alias("department"), seasonsDF("department") === dfDept("key"), "left_outer").join(dfDeptCategory.alias("dept_Category"), seasonsDF("dept_Category") === dfDeptCategory("key"), "left_outer").join(dfDotcom.alias("dotcomOnly"), seasonsDF("dotcomOnly") === dfDotcom("key"), "left_outer").join(dfseasonType.alias("type"), seasonsDF("type") === dfseasonType("key"), "left_outer").join(dfWalmartWeek.alias("walmartWeek"), seasonsDF("walmartWeek") === dfWalmartWeek("key"), "left_outer").join(dfSetWeek.alias("setWeek"), seasonsDF("setWeek") === dfSetWeek("key"), "left_outer").join(dfEvent.alias("event"), seasonsDF("event") === dfEvent("key"), "left_outer").join(dfQtr.alias("quarter"), seasonsDF("quarter") === dfQtr("key"), "left_outer").select("seasonsDF.seasonMasterID","seasonsDF.seasonName","fiscalYearEnding.value","SBU.value","department.value","dept_Category.value","dotcomOnly.value","seasonsDF.active","type.value","seasonsDF.createdDate","seasonsDF.createdBy","seasonsDF.updatedDate","seasonsDF.modifiedBy","seasonsDF.seasonId","seasonsDF.flexID","primaryPalette.paletteName","walmartWeek.value","setWeek.value","event.value","quarter.value","hierarchy.DisplayName").toDF("seasonMasterID","seasonName","fiscalYearEnding","SBU","department","dept_Category","dotcomOnly","active","type","createdDate","createdBy","updatedDate","modifiedBy","seasonId","flexID","primaryPalette","walmartWeek","setWeek","event","quarter","hierarchy")

最后得到结果= DF

scala> seasonFinalDF.printSchema
root
 |-- seasonMasterID: long (nullable = true)
 |-- seasonName: string (nullable = true)
 |-- fiscalYearEnding: string (nullable = true)
 |-- SBU: string (nullable = true)
 |-- department: string (nullable = true)
 |-- dept_Category: string (nullable = true)
 |-- dotcomOnly: string (nullable = true)
 |-- active: integer (nullable = true)
 |-- type: string (nullable = true)
 |-- createdDate: timestamp (nullable = true)
 |-- createdBy: long (nullable = true)
 |-- updatedDate: timestamp (nullable = true)
 |-- modifiedBy: long (nullable = true)
 |-- seasonId: long (nullable = true)
 |-- flexID: string (nullable = true)
 |-- primaryPalette: string (nullable = true)
 |-- walmartWeek: string (nullable = true)
 |-- setWeek: string (nullable = true)
 |-- event: string (nullable = true)
 |-- quarter: string (nullable = true)

现在,在这种情况下,除第一个连接外,所有左连接都为空,当我做一个解释时,它多次连接同一个sql

我不确定问题出在哪里以及怎么出问题,任何人都可以帮忙找到正确的左联接方式

0 个答案:

没有答案