Question

我正在使用数据帧df，它包含Column多重类型数组[struct]，是double的...我想更改数据帧的schema：现有的schema：

        root
    |-- _id: long (nullable = true)
    |-- d: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |-- d2: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |-- d4: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- value: string (nullable = true)
    |    |    |    |-- type: string (nullable = false)
    |    |    |-- value: string (nullable = true)
    |    |    |-- vn: double (nullable = true)

我想获取一个具有类似架构的数据框

       root
    |-- context_id: long (nullable = true)
    |-- data: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- value: string (nullable = true)
    |    |    |    |-- type: string (nullable = false)
    |    |    |-- value: string (nullable = true)
    |    |    |-- vn: double (nullable = true)

我尝试使用spark SQL来执行此操作，但是如上所述，该模式包含复杂的模式，我也尝试创建udf来concat这三个数组，但是由于{{1} }不支持udf和wrappedarray[structType] ... 任何建议，请

如何更改数据框的架构？

0 个答案: