答案 0 :(得分:4)
确保您具有唯一的列名,可以拒绝:
import or.apache.spark.sql.functions._
table
.select("id","movie",explode(array("cast1", "cast2", "cast3", "cast4")).as("cast"))
.where(col("cast").isNotNull)
答案 1 :(得分:0)
table.groupBy("ID", "Movie")
.agg(collect_list("Cast1", "Cast2", "Cast3", "Cast2").as("cast"))
.withColumn("cast", explode("cast"))
//注意:您应该始终避免在同一DataFrame中重复列名
答案 2 :(得分:0)
带有“联合”:
val table = List(
(101, "ABC", "A", "B", "C", "D"),
(102, "XZY", "G", "J", null, null))
.toDF("ID", "Movie", "Cast1", "Cast2", "Cast3", "Cast4")
val columnsToUnion = List("Cast1", "Cast2", "Cast3", "Cast4")
val result = columnsToUnion.map(name => table.select($"ID", $"Movie", col(name).alias("Cast")).where(col(name).isNotNull))
.reduce(_ union _)
result.show(false)
输出:
+---+-----+----+
|ID |Movie|Cast|
+---+-----+----+
|101|ABC |A |
|102|XZY |G |
|101|ABC |B |
|102|XZY |J |
|101|ABC |C |
|101|ABC |D |
+---+-----+----+
注意:表不能有多个具有相同名称的列,假设列名称具有以下模式:“ Cast [i]”