如何将scala中的列列扩展为多行

时间:2018-05-17 15:57:44

标签: scala databricks

我想转到以下列表:

val articledDF = spark.createDF(
  List(
    ("article 1", Array("topic 1", "topic 2")),
    ("article 2", Array("topic 1", "topic 3")),
    ("article 3", Array("topic 2"))
  ), List(
    ("article", StringType, true),
    ("topics", ArrayType(StringType, true), true)
  )
)

结果是:

+---------+---------------------+
| name    |topics               |
+---------+---------------------+
|article 1|   [topic 1, topic 2]|
|article 2|   [topic 1, topic 3]|
|article 3|            [topic 2]|
+---------+---------------------+

并按以下方式扩展列主题:

+---------+-----------+
| name    |topic      |
+---------+-----------+
|article 1|   topic 1 |
|article 1|   topic 2 |
|article 2|   topic 1 |
|article 2|   topic 3 |
|article 3|   topic 2 |
+---------+-----------+

很乐意学习如何做到这一点。

1 个答案:

答案 0 :(得分:2)

使用explode

import org.apache.spark.sql.functions._
import spark.implicits._

articledDF.select($"article", explode($"topics") as "topic")