如何更改数据框的架构?

时间:2018-10-03 15:03:16

标签: scala apache-spark apache-spark-sql

我正在使用数据帧df,它包含Column多重类型数组[struct],是double的...我想更改数据帧的schema: 现有的schema

        root
    |-- _id: long (nullable = true)
    |-- d: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |-- d2: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |-- d4: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- value: string (nullable = true)
    |    |    |    |-- type: string (nullable = false)
    |    |    |-- value: string (nullable = true)
    |    |    |-- vn: double (nullable = true)

我想获取一个具有类似架构的数据框

       root
    |-- context_id: long (nullable = true)
    |-- data: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- col1: struct (nullable = false)
    |    |    |    |    |-- value: string (nullable = true)
    |    |    |    |    |-- type: string (nullable = false)
    |    |    |-- resource: string (nullable = true)
    |    |    |-- cri: string (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- col1: struct (nullable = false)
    |    |    |    |-- value: string (nullable = true)
    |    |    |    |-- type: string (nullable = false)
    |    |    |-- value: string (nullable = true)
    |    |    |-- vn: double (nullable = true)

我尝试使用spark SQL来执行此操作,但是如上所述,该模式包含复杂的模式,我也尝试创建udfconcat这三个数组,但是由于{{1} }不支持udfwrappedarray[structType] ... 任何建议,请

0 个答案:

没有答案