我正在使用数据帧df
,它包含Column
多重类型数组[struct],是double的...我想更改数据帧的schema
:
现有的schema
:
root
|-- _id: long (nullable = true)
|-- d: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- value: string (nullable = true)
| | | | |-- type: string (nullable = false)
| | |-- resource: string (nullable = true)
| | |-- cri: string (nullable = true)
|-- d2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- value: string (nullable = true)
| | | | |-- type: string (nullable = false)
| | |-- resource: string (nullable = true)
| | |-- cri: string (nullable = true)
|-- d4: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- value: string (nullable = true)
| | | |-- type: string (nullable = false)
| | |-- value: string (nullable = true)
| | |-- vn: double (nullable = true)
我想获取一个具有类似架构的数据框
root
|-- context_id: long (nullable = true)
|-- data: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- value: string (nullable = true)
| | | | |-- type: string (nullable = false)
| | |-- resource: string (nullable = true)
| | |-- cri: string (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- col1: struct (nullable = false)
| | | | |-- value: string (nullable = true)
| | | | |-- type: string (nullable = false)
| | |-- resource: string (nullable = true)
| | |-- cri: string (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col1: struct (nullable = false)
| | | |-- value: string (nullable = true)
| | | |-- type: string (nullable = false)
| | |-- value: string (nullable = true)
| | |-- vn: double (nullable = true)
我尝试使用spark SQL
来执行此操作,但是如上所述,该模式包含复杂的模式,我也尝试创建udf
来concat
这三个数组,但是由于{{1} }不支持udf
和wrappedarray[structType]
...
任何建议,请