如何使用scala通过Spark中的给定数据集创建给定的数据帧?

时间:2018-05-16 11:20:46

标签: scala apache-spark dataframe

数据集:

GroupID Name_of_books

101 book1, book2, book3, book4

102 book10, book12, book13, book14

必需的输出:

101 book1

101 book2

101 book3

101 book4

102 book10

102 book11

103 book12

104 book13

1 个答案:

答案 0 :(得分:0)

您可以将explode功能用作

import org.apache.spark.sql.functions._

val resuldDF = df.select($"GroupID", explode($"Name_of_books").as("Name_of_books")

或withColumn

val resuldDF = df.withColumn("Name_of_books", explode($"Name_of_books"))

如果列是Array或Map,则此方法有效 如果您有一个以逗号分隔的字符串值,则需要先split并将explode应用为

val resuldDF = df.select($"GroupID", explode(split($"Name_of_books", ",")))

希望这有帮助!