Spark中的爆炸结构

时间:2017-11-14 12:26:14

标签: hadoop apache-spark apache-spark-sql

我有以下架构的DataFrame:

(map, filter, some...)

我想要破坏结构,以便asin,customerId,eventTime等所有元素成为DataFrame中的列。我试过爆炸功能,但它适用于数组而不是结构类型。是否可以将有能力的数据帧转换为低于数据帧:

 |-- data: struct (nullable = true)
 |    |-- asin: string (nullable = true)
 |    |-- customerId: long (nullable = true)
 |    |-- eventTime: long (nullable = true)
 |    |-- marketplaceId: long (nullable = true)
 |    |-- rating: long (nullable = true)
 |    |-- region: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- uploadedDate: long (nullable = true)

1 个答案:

答案 0 :(得分:2)

这很简单:

val newDF = df.select("uploadedDate", "data.*");

您告诉选择uploadedDate然后选择字段数据的所有子元素

示例:

scala> case class A(a: Int, b: Double)
scala> val df = Seq((A(1, 1.0), "1"), (A(2, 2.0), "2")).toDF("data", "uploadedDate")
scala> val newDF = df.select("uploadedDate", "data.*")
scala> newDF.show()
+------------+---+---+
|uploadedDate|  a|  b|
+------------+---+---+
|           1|  1|1.0|
|           2|  2|2.0|
+------------+---+---+

scala> newDF.printSchema()
root
 |-- uploadedDate: string (nullable = true)
 |-- a: integer (nullable = true)
 |-- b: double (nullable = true)