如何从MEMEBERDETAIL获取所有单个元素?
scala> xmlDF.printSchema
root
|-- MEMBERDETAIL: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- FILE_ID: double (nullable = true)
| | |-- INP_SOURCE_ID: long (nullable = true)
| | |-- NET_DB_CR_SW: string (nullable = true)
| | |-- NET_PYM_AMT: string (nullable = true)
| | |-- ORGNTD_DB_CR_SW: string (nullable = true)
| | |-- ORGNTD_PYM_AMT: double (nullable = true)
| | |-- RCVD_DB_CR_SW: string (nullable = true)
| | |-- RCVD_PYM_AMT: string (nullable = true)
| | |-- RECON_DATE: string (nullable = true)
| | |-- SLNO: long (nullable = true)
scala> xmlDF.head
res147: org.apache.spark.sql.Row = [WrappedArray([1.1610100000001425E22,1,D, 94,842.38,C,0.0,D, 94,842.38,2016-10-10,1], [1.1610100000001425E22,1,D, 33,169.84,C,0.0,D, 33,169.84,2016-10-10,2], [1.1610110000001425E22,1,D, 155,500.88,C,0.0,D, 155,500.88,2016-10-11,3], [1.1610110000001425E22,1,D, 164,952.29,C,0.0,D, 164,952.29,2016-10-11,4], [1.1610110000001425E22,1,D, 203,061.06,C,0.0,D, 203,061.06,2016-10-11,5], [1.1610110000001425E22,1,D, 104,040.01,C,0.0,D, 104,040.01,2016-10-11,6], [2.1610110000001427E22,1,C, 849.14,C,849.14,C, 0.00,2016-10-11,7], [1.1610100000001465E22,1,D, 3.78,C,0.0,D, 3.78,2016-10-10,1], [1.1610100000001465E22,1,D, 261.54,C,0.0,D, ...
在尝试了很多方法之后,我能够获得如下所示的“任意”对象,但又无法单独读取所有字段。
xmlDF.select($"MEMBERDETAIL".getItem(0)).head().get(0)
res56: Any = [1.1610100000001425E22,1,D,94,842.38,C,0.0,D,94,842.38,2016-10-10,1]
而StructType如下所示 -
res61: org.apache.spark.sql.DataFrame = [MEMBERDETAIL[0]: struct<FILE_ID:double,INP_SOURCE_ID:bigint,NET_DB_CR_SW:string,NET_PYM_AMT:string,ORGNTD_DB_CR_SW:string,ORGNTD_PYM_AMT:double,RCVD_DB_CR_SW:string,RCVD_PYM_AMT:string,RECON_DATE:string,SLNO:bigint>]
答案 0 :(得分:0)
这实际上帮助了我 -
xmlDF.selectExpr("explode(MEMBERDETAIL) as e").select("e.FILE_ID", "e.INP_SOURCE_ID", "e.NET_DB_CR_SW", "e.NET_PYM_AMT", "e.ORGNTD_DB_CR_SW", "e.ORGNTD_PYM_AMT", "e.RCVD_DB_CR_SW", "e.RCVD_PYM_AMT", "e.RECON_DATE").show()