数据集:
GroupID Name_of_books
101 book1, book2, book3, book4
102 book10, book12, book13, book14
必需的输出:
101 book1
101 book2
101 book3
101 book4
102 book10
102 book11
103 book12
104 book13
答案 0 :(得分:0)
您可以将explode
功能用作
import org.apache.spark.sql.functions._
val resuldDF = df.select($"GroupID", explode($"Name_of_books").as("Name_of_books")
或withColumn
val resuldDF = df.withColumn("Name_of_books", explode($"Name_of_books"))
如果列是Array或Map,则此方法有效
如果您有一个以逗号分隔的字符串值,则需要先split
并将explode
应用为
val resuldDF = df.select($"GroupID", explode(split($"Name_of_books", ",")))
希望这有帮助!