联接数据上的复杂数据类型实现

时间:2019-07-08 09:12:11

标签: scala apache-spark apache-spark-sql

我已将两个表中的数据联接起来,我想将其转换为复杂的数据类型(Map)。

// creating data frame of product data file
val proDF = spark.read.format("parquet").load("path")

// creating data frame of Finance data file
val finDF = spark.read.format("parquet").load("path")

// joining both the data frames
val proFinJoinDF = proDF.joinWith(finDF, proDF("col1") === finDF("col1") && proDF("col2") === finDF("col2") && proDF("col3") === finDF("col3"))

// saving joined data into temporary table 
proFinJoinDF.registerTempTable("join_Data")

val new1 = spark.sql("""select 
_1.col1 as A,
_1.col2 as B,...
_2.col1 as P,
_2.col2 as Q,
_2.col3 as R... from join_data""" )

// to convert data type to "string" for Map
var strnew1 = new1.select(new1.columns.map(c => col(c).cast(StringType)) : _*)

// creating case class for Map type 
case class Pro_Fin(A: String, B: String, ProMap: Map[String, String], FinMap: Map[String, String])

现在,我想将合并的数据转换为上述案例类(Pro_Fin)并将其保存到表中。

预期输出:

A   B   proMap<c:value, d:value …>  finMap<P:value, Q:value …>

0 个答案:

没有答案