我正在使用包含多数组列的spark数据框,我想获得一个包含所有数组的数组。
root
|-- context_id: long (nullable = true)
|-- data1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- v: string (nullable = true)
| | | | |-- t: string (nullable = false)
| | |-- resourcename: string (nullable = true)
| | |-- criticity: string (nullable = true)
|-- data2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- v: string (nullable = true)
| | | | |-- t: string (nullable = false)
| | |-- resourcename: string (nullable = true)
| | |-- criticity: string (nullable = true)
|-- data4: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- v: string (nullable = true)
| | | |-- t: string (nullable = false)
| | |-- v: string (nullable = true)
| | |-- vn: double (nullable = true)
我想要一个像这样的数据框:
root
|-- context_id: long (nullable = true)
|-- data: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- v: string (nullable = true)
| | | | |-- t: string (nullable = false)
| | |-- resourcename: string (nullable = true)
| | |-- criticity: string (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- v: string (nullable = true)
| | | | |-- t: string (nullable = false)
| | |-- resourcename: string (nullable = true)
| | |-- criticity: string (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- v: string (nullable = true)
| | | |-- t: string (nullable = false)
| | |-- v: string (nullable = true)
| | |-- vn: double (nullable = true)
所以我尝试使用enter link description here来连接数据帧,但是当我使用mutable.WrappedArray [StructType]而不是mutable.WrappedArray [String]时,会给我错误
Exception in thread "main" scala.MatchError: org.apache.spark.sql.types.StructType (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
我也绑定了enter link description here Sql [Row]并给出了相同的错误,我认为此示例中的udf接受Sql [Row]作为参数,但是没有返回Sql [Row] >
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported
请帮忙!