Question

我正在使用包含多数组列的spark数据框，我想获得一个包含所有数组的数组。

     root
        |-- context_id: long (nullable = true)
        |-- data1: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |-- data2: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |-- data4: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- v: string (nullable = true)
        |    |    |    |-- t: string (nullable = false)
        |    |    |-- v: string (nullable = true)
        |    |    |-- vn: double (nullable = true)

我想要一个像这样的数据框：

     root
        |-- context_id: long (nullable = true)
        |-- data: array (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- col1: struct (nullable = false)
        |    |    |    |    |-- v: string (nullable = true)
        |    |    |    |    |-- t: string (nullable = false)
        |    |    |-- resourcename: string (nullable = true)
        |    |    |-- criticity: string (nullable = true)
        |    |-- element: struct (containsNull = true)
        |    |    |-- col1: struct (nullable = false)
        |    |    |    |-- v: string (nullable = true)
        |    |    |    |-- t: string (nullable = false)
        |    |    |-- v: string (nullable = true)
        |    |    |-- vn: double (nullable = true)

所以我尝试使用enter link description here来连接数据帧，但是当我使用mutable.WrappedArray [StructType]而不是mutable.WrappedArray [String]时，会给我错误

      Exception in thread "main" scala.MatchError: org.apache.spark.sql.types.StructType (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)

我也绑定了enter link description here Sql [Row]并给出了相同的错误，我认为此示例中的udf接受Sql [Row]作为参数，但是没有返回Sql [Row]

         Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported

请帮忙！

如何将数组类型的multi_array列数据框合并为一列？

0 个答案: