我正在尝试使用Spark SQL记录的结果构建结构数组。有什么方法可以将记录推送到SQL记录的结构数组中。
例如:执行Spark SQL后,我有以下数据
ID NAME DEPT FROM_DT TO_DT EMAIL ----------------------------------------------------------------------------- 1234 Robert 101 02/01/2012 03/14/2014 1234@GOG.com 1234 Robert 102 03/15/2014 07/04/2015 1234@GOG.com 1234 Robert 103 07/05/2015 03/25/2019 1234@GOG.com 6754 Albert 102 03/01/2012 09/19/2015 6754@GOG.com 6754 Albert 101 09/20/2015 03/25/2019 6754@GOG.comI am trying to format the above result set data in the following format in through pyspark2.
{1234, Robert, [{DEPT:101, FROM_DT:02/01/2012, TO_DT:03/14/2014}, {DEPT:102, FROM_DT:03/15/2014, TO_DT:07/04/2015}, {DEPT:103, FROM_DT:07/05/2015, TO_DT:03/25/2019}], 1234@GOG.com} {6754, Albert, [{DEPT:102, FROM_DT:03/01/2012, TO_DT:09/19/2014}, {DEPT:101, FROM_DT:09/20/2015, TO_DT:03/25/2019}], 6754@GOG.com}
Caused by: java.net.UnknownHostException: my-service
from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql.types SparkContext import Row sc=spark.sparkContext raw_dept_data =sc.textFile("Raw_DEPT_File/part-m-00000") dept_rdd=raw_dept_data.map(lambda r:Row(ID=r[0],NAME=r[1],DEPT=r[2],FROM_DT=r[3],To_DT=r[4])) dept_dataframe=spark.createDataFrame(dept_rdd) dept_dataframe.createOrReplaceTempView("History_Dept") email_data =sc.textFile("Raw_Email_File/part-m-00000") email_rdd=raw_data.map(lambda r:Row(ID=r[0],NAME=r[1],EMAIL=r[2])) email_dataframe=spark.createDataFrame(email_rdd) dataframe.createOrReplaceTempView("History_Email") spark.sql("SELECT DP.ID, EM.NAME, DP.DEPT, DP.FROM_DT, DP.TO_DT, EM.EMAIL FROM History_Dept as DP, History_Email as EM WHERE DP.ID = EM.ID")
如何将结果转换为指定的格式?