如何从Spark中的DataFrame中从嵌套的WrappedArray中提取所有单个元素

时间:2016-10-23 12:14:50

标签: scala apache-spark spark-dataframe hadoop2

如何从MEMEBERDETAIL获取所有单个元素?

 scala> xmlDF.printSchema
    root
     |-- MEMBERDETAIL: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- FILE_ID: double (nullable = true)
     |    |    |-- INP_SOURCE_ID: long (nullable = true)
     |    |    |-- NET_DB_CR_SW: string (nullable = true)
     |    |    |-- NET_PYM_AMT: string (nullable = true)
     |    |    |-- ORGNTD_DB_CR_SW: string (nullable = true)
     |    |    |-- ORGNTD_PYM_AMT: double (nullable = true)
     |    |    |-- RCVD_DB_CR_SW: string (nullable = true)
     |    |    |-- RCVD_PYM_AMT: string (nullable = true)
     |    |    |-- RECON_DATE: string (nullable = true)
     |    |    |-- SLNO: long (nullable = true)

scala> xmlDF.head
res147: org.apache.spark.sql.Row = [WrappedArray([1.1610100000001425E22,1,D,        94,842.38,C,0.0,D,        94,842.38,2016-10-10,1], [1.1610100000001425E22,1,D,        33,169.84,C,0.0,D,        33,169.84,2016-10-10,2], [1.1610110000001425E22,1,D,       155,500.88,C,0.0,D,       155,500.88,2016-10-11,3], [1.1610110000001425E22,1,D,       164,952.29,C,0.0,D,       164,952.29,2016-10-11,4], [1.1610110000001425E22,1,D,       203,061.06,C,0.0,D,       203,061.06,2016-10-11,5], [1.1610110000001425E22,1,D,       104,040.01,C,0.0,D,       104,040.01,2016-10-11,6], [2.1610110000001427E22,1,C,           849.14,C,849.14,C,             0.00,2016-10-11,7], [1.1610100000001465E22,1,D,             3.78,C,0.0,D,             3.78,2016-10-10,1], [1.1610100000001465E22,1,D,           261.54,C,0.0,D,    ...

在尝试了很多方法之后,我能够获得如下所示的“任意”对象,但又无法单独读取所有字段。

xmlDF.select($"MEMBERDETAIL".getItem(0)).head().get(0)
res56: Any = [1.1610100000001425E22,1,D,94,842.38,C,0.0,D,94,842.38,2016-10-10,1]

而StructType如下所示 -

res61: org.apache.spark.sql.DataFrame = [MEMBERDETAIL[0]: struct<FILE_ID:double,INP_SOURCE_ID:bigint,NET_DB_CR_SW:string,NET_PYM_AMT:string,ORGNTD_DB_CR_SW:string,ORGNTD_PYM_AMT:double,RCVD_DB_CR_SW:string,RCVD_PYM_AMT:string,RECON_DATE:string,SLNO:bigint>]

1 个答案:

答案 0 :(得分:0)

这实际上帮助了我 -

xmlDF.selectExpr("explode(MEMBERDETAIL) as e").select("e.FILE_ID", "e.INP_SOURCE_ID", "e.NET_DB_CR_SW", "e.NET_PYM_AMT", "e.ORGNTD_DB_CR_SW", "e.ORGNTD_PYM_AMT", "e.RCVD_DB_CR_SW", "e.RCVD_PYM_AMT", "e.RECON_DATE").show()