我的spark sql和scala代码:
var df = spark.sql(
s"""
|SELECT id, a, b, c, d
|FROM default.table
""".stripMargin)
var grouped_df = df.withColumn("map", struct("a", "b", "c", "d"))
grouped_df
的输出:
{
"id": 41286786,
"map": {
"a": "",
"b": "724",
"c": "7425",
"d": ""
}
}
如何获取以下输出或将grouped_df
转换为:
{
"id": 41286786,
"array": [
{ "name": "b", "value": "724" },
{ "name": "c", "value": "7245" }
]
}
如何在spark sql或UDF中做到这一点?
答案 0 :(得分:1)
以下是在Scala中使用DataFrame API进行操作的方法(自然没有UDF):
import org.apache.spark.sql.functions.{array, struct, lit}
val result = grouped_df
.select(
$"id",
array(
struct(lit("b").alias("name"), $"map.b".alias("value")),
struct(lit("c").alias("name"), $"map.c".alias("value"))
).alias("array")
)