如何将数组类型的multi_array列数据框合并为一列?

时间:2018-10-03 11:27:45

标签: scala apache-spark apache-spark-sql

我正在使用包含多数组列的spark数据框,我想获得一个包含所有数组的数组。

     root
        |-- context_id: long (nullable = true)
        |-- data1: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |-- data2: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |-- data4: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- v: string (nullable = true)
        |    |    |    |-- t: string (nullable = false)
        |    |    |-- v: string (nullable = true)
        |    |    |-- vn: double (nullable = true)

我想要一个像这样的数据框:

     root
        |-- context_id: long (nullable = true)
        |-- data: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- v: string (nullable = true)
        |    |    |    |-- t: string (nullable = false)
        |    |    |-- v: string (nullable = true)
        |    |    |-- vn: double (nullable = true)

所以我尝试使用enter link description here来连接数据帧,但是当我使用mutable.WrappedArray [StructType]而不是mutable.WrappedArray [String]时,会给我错误

      Exception in thread "main" scala.MatchError: org.apache.spark.sql.types.StructType (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)

我也绑定了enter link description here Sql [Row]并给出了相同的错误,我认为此示例中的udf接受Sql [Row]作为参数,但是没有返回Sql [Row]

         Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported

请帮忙!

0 个答案:

没有答案