有没有办法从Pyspark中的Spark SQL查询结果构建结构数组

时间:2019-03-25 17:32:33

标签: apache-spark pyspark pyspark-sql hadoop-streaming apache-spark-2.0

我正在尝试使用Spark SQL记录的结果构建结构数组。有什么方法可以将记录推送到SQL记录的结构数组中。

例如:执行Spark SQL后,我有以下数据



       ID     NAME          DEPT       FROM_DT        TO_DT         EMAIL 
        -----------------------------------------------------------------------------
        1234   Robert        101       02/01/2012      03/14/2014     1234@GOG.com
        1234   Robert        102       03/15/2014      07/04/2015     1234@GOG.com
        1234   Robert        103       07/05/2015      03/25/2019     1234@GOG.com
        6754   Albert        102       03/01/2012      09/19/2015     6754@GOG.com
        6754   Albert        101       09/20/2015      03/25/2019     6754@GOG.com

    

I am trying to format the above result set data in the following format in through pyspark2.

{1234, Robert, [{DEPT:101, FROM_DT:02/01/2012, TO_DT:03/14/2014}, {DEPT:102, FROM_DT:03/15/2014, TO_DT:07/04/2015}, {DEPT:103, FROM_DT:07/05/2015, TO_DT:03/25/2019}], 1234@GOG.com} {6754, Albert, [{DEPT:102, FROM_DT:03/01/2012, TO_DT:09/19/2014}, {DEPT:101, FROM_DT:09/20/2015, TO_DT:03/25/2019}], 6754@GOG.com}
Caused by: java.net.UnknownHostException: my-service

    from pyspark import SparkContext
    from pyspark.sql import SQLContext
    from pyspark.sql.types SparkContext import Row

    sc=spark.sparkContext
    raw_dept_data =sc.textFile("Raw_DEPT_File/part-m-00000")
    dept_rdd=raw_dept_data.map(lambda r:Row(ID=r[0],NAME=r[1],DEPT=r[2],FROM_DT=r[3],To_DT=r[4]))
    dept_dataframe=spark.createDataFrame(dept_rdd)
    dept_dataframe.createOrReplaceTempView("History_Dept")

    email_data =sc.textFile("Raw_Email_File/part-m-00000")
    email_rdd=raw_data.map(lambda r:Row(ID=r[0],NAME=r[1],EMAIL=r[2]))
    email_dataframe=spark.createDataFrame(email_rdd)
    dataframe.createOrReplaceTempView("History_Email")

    spark.sql("SELECT DP.ID, EM.NAME, DP.DEPT, DP.FROM_DT, DP.TO_DT, EM.EMAIL FROM History_Dept as DP, History_Email as EM WHERE DP.ID = EM.ID")

如何将结果转换为指定的格式?

0 个答案:

没有答案