如何拆分包含
的数据集movie title | movie genres
Toy Story | Animation|Children|Comedy
进入
movie title | movie genres
Toy Story | Animation
Toy Story | Children
Toy Story | Comedy
不使用爆炸方法,因为它现在已折旧
答案 0 :(得分:0)
它被弃用"从某种意义上说,你不能再在数据集/数据帧上调用def explode。但是文档还说"你可以使用functions.explode()或flatMap()"
来爆炸列。这是我在Scala REPL中快速测试的Scala示例,但我确信此策略适用于Java:
import spark.implicits._
case class movie(title: String, genre: String)
val m1 = new movie("Toy Story", "Animation|Children|Comedy")
val df = Seq(m1).toDF()
df.show(5, false)
+---------+-------------------------+
|title |granular_genre |
+---------+-------------------------+
|Toy Story|Animation|Children|Comedy|
+---------+-------------------------+
val df2 = val df2 = df.select('title, explode(split('genre,"""\|""")).as("granular_genre"))
df2.show(5, false)
+---------+--------------+
|title |granular_genre|
+---------+--------------+
|Toy Story|Animation |
|Toy Story|Children |
|Toy Story|Comedy |
+---------+--------------+