好
我正在使用Scala中的spark框架。我的数据框有一个具有以下结构和内容的列:
+---------------------------------------------------------------------------------------------+
|Email_Code |
+---------------------------------------------------------------------------------------------+
|[WrappedArray([3,spain]), WrappedArray([,]), WrappedArray([3,spain])] |
|[WrappedArray([3,spain]), WrappedArray([3,spain])] |
+---------------------------------------------------------------------------------------------+
|-- Email_Code: array (nullable = true)
| |-- element: array (containsNull = false)
| | |-- element: struct (containsNull = false)
| | | |-- Code: string (nullable = true)
| | | |-- Value: string (nullable = true)
我正在尝试开发一个udf函数,该函数采用数组中存在的``代码''结构的所有值。但是我不能...
我想要一个类似以下的出口:
+---------------------------------------------------------------------------------------------+
|Email_Code |
+---------------------------------------------------------------------------------------------+
|[3,,3] |
|[3,3] |
+---------------------------------------------------------------------------------------------+
请帮忙吗?
答案 0 :(得分:0)
我必须解决它:
val transformation = udf((data: Seq[Seq[Row]]) => {data.flatMap(x => x).map{case Row(code:String, value:String) => code}})
df.withColumn("result", transformation($"columnName"))