如何在Spark Scala中将宽数据框转换为垂直数据框

时间:2020-11-10 20:28:22

标签: scala apache-spark

说我有这个初始数据框:

  val df_temp = Seq(("Mike",23,"NY","CA","FL"),("Bill",25,"CA","TX","MA"),("Kevin",22,"NY","NJ","CA")).toDF("Name","Age","State1","State2","State3")

enter image description here

我想将其转换为此数据框:

  val df_temp2 = Seq(("Mike",23,"NY"),("Mike",23,"CA"),("Mike",23,"FL"),("Bill",25,"CA"),("Bill",25,"TX"),("Bill",25,"MA"),("Kevin",22,"NY"),("Kevin",22,"NJ"),("Kevin",22,"CA")).toDF("Name","Age","State")

enter image description here

我该怎么做?

非常感谢,祝您有愉快的一天!

1 个答案:

答案 0 :(得分:1)

这是代码:

df_temp.withColumn("States", array($"State1", $"State2", $"State3"))
  .select($"Name", $"Age", explode($"States").as("State"))

所用功能的参考:arrayexplode