spark:将struct / dictionary转换为struct / dictionaries数组

时间:2019-08-17 15:38:46

标签: scala apache-spark apache-spark-sql

我的spark sql和scala代码:

var df = spark.sql(
     s"""
             |SELECT id, a, b, c, d
             |FROM default.table
      """.stripMargin)

var grouped_df = df.withColumn("map", struct("a", "b", "c", "d"))

grouped_df的输出:

{
  "id": 41286786,
  "map": {
    "a": "",
    "b": "724",
    "c": "7425",
    "d": ""
  }
 }

如何获取以下输出或将grouped_df转换为:

{
  "id": 41286786,
  "array": [
    { "name": "b", "value": "724" },
    { "name": "c", "value": "7245" }
  ]
 }

如何在spark sql或UDF中做到这一点?

1 个答案:

答案 0 :(得分:1)

以下是在Scala中使用DataFrame API进行操作的方法(自然没有UDF):

import org.apache.spark.sql.functions.{array, struct, lit}

val result = grouped_df
  .select(
    $"id",
    array(
      struct(lit("b").alias("name"), $"map.b".alias("value")),
      struct(lit("c").alias("name"), $"map.c".alias("value"))
    ).alias("array")
  )