Spark Aggregator的输出为 List [Character]
SELECT [session_id]
,[creation_time]
,[device_id]
,[isEmailSent]
,[comment]
,[patient_id]
,[doctor_id]
FROM [dbo].[tbl_session_protocol] t
where t.doctor_id=10
and cast(t.creation_time as date) = (select cast(max(s.creation_time) as date) from tbl_session_protocol s where s.doctor_id=10);
所以我的数据框看起来像:
case class Character(name: String, secondName: String, faculty: String)
val charColumn = HPAggregator.toColumn
val resultDF = someDF.select(charColumn)
现在我想将其转换为
+-----------------------------------------------+
| value |
+-----------------------------------------------+
|[[harry, potter, gryffindor],[ron, weasley ... |
+-----------------------------------------------+
我该怎么做呢?
答案 0 :(得分:4)
这可以使用“爆炸”和“拆分数据框”功能来完成。
下面是一个示例:
>>> df = spark.createDataFrame([[[['a','b','c'], ['d','e','f'], ['g','h','i']]]],["col1"])
>>> df.show(20, False)
+---------------------------------------------------------------------+
|col1 |
+---------------------------------------------------------------------+
|[WrappedArray(a, b, c), WrappedArray(d, e, f), WrappedArray(g, h, i)]|
+---------------------------------------------------------------------+
>>> from pyspark.sql.functions import explode
>>> out_df = df.withColumn("col2", explode(df.col1)).drop('col1')
>>>
>>> out_df .show()
+---------+
| col2|
+---------+
|[a, b, c]|
|[d, e, f]|
|[g, h, i]|
+---------+
>>> out_df.select(out_df.col2[0].alias('c1'), out_df.col2[1].alias('c2'), out_df.col2[2].alias('c3')).show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
| a| b| c|
| d| e| f|
| g| h| i|
+---+---+---+
>>>