如何在Spark数据集中将一行转换为另一行

时间:2018-10-31 03:38:18

标签: apache-spark dictionary dataset row

例如,我有一个表,其架构如下:

id int,
name string,
score int

我想转换为

id int,
attribute struct<name string,score int>

2 个答案:

答案 0 :(得分:0)

您可以这样写:

您需要这样编写UDF:

val toArray = udf((value1 : String, value2: String) => List(value1,value2))

您的逻辑应该这样:

val df1 = df.withColumn("attribute",toArray(dff.col("name"),dff.col("score")))

df1.select("id","attribute").show()

答案 1 :(得分:0)

可以使用函数“ struct”:

val originalDF = List((1, "any", 10)).toDF("id", "name", "score")
val transformed = originalDF.select($"id", struct($"name", $"score").alias("attribute"))
transformed.printSchema()

输出:

root
 |-- id: integer (nullable = false)
 |-- attribute: struct (nullable = false)
 |    |-- name: string (nullable = true)
 |    |-- score: integer (nullable = false)