如何爆炸结构体数组?

时间:2018-12-20 14:40:39

标签: apache-spark apache-spark-sql

我正在使用JSON对象,并希望基于Spark SQL数据帧/数据集将object.hours转换为关系表。

我尝试使用“爆炸”,它实际上并不支持“结构数组”。

json对象在下面:

use Illuminate\Http\Request;


public function webhook (Request $request) {
    if (filled($request->input('validationToken'))) {
        return response($request->input('validationToken'))
                ->header('Content-Type', 'text/plain');
    }


    // code to process the webhook after validation is complete
}

对于如下所示的关系表,

{
  "business_id": "abc",
  "full_address": "random_address",
  "hours": {
    "Monday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Tuesday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Friday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Wednesday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Thursday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Sunday": {
      "close": "00:00",
      "open": "11:00"
    },
    "Saturday": {
      "close": "02:00",
      "open": "11:00"
    }
  }
}

1 个答案:

答案 0 :(得分:3)

您可以使用以下技巧进行操作:

import org.apache.spark.sql.types.StructType
val days = df.schema 
  .fields
  .filter(_.name=="hours")
  .head
  .dataType
  .asInstanceOf[StructType]
  .fieldNames

val solution = df
  .select(
    $"business_id",
    $"full_address",
    explode(
      array(
        days.map(d => struct(
          lit(d).as("day"),
          col(s"hours.$d.open").as("open_time"),
          col(s"hours.$d.close").as("close_time")
        )):_*
      )
    )
  )
  .select($"business_id",$"full_address",$"col.*")

scala> solution.show
+-----------+--------------+---------+---------+----------+
|business_id|  full_address|      day|open_time|close_time|
+-----------+--------------+---------+---------+----------+
|        abc|random_address|   Friday|    11:00|     02:00|
|        abc|random_address|   Monday|    11:00|     02:00|
|        abc|random_address| Saturday|    11:00|     02:00|
|        abc|random_address|   Sunday|    11:00|     00:00|
|        abc|random_address| Thursday|    11:00|     02:00|
|        abc|random_address|  Tuesday|    11:00|     02:00|
|        abc|random_address|Wednesday|    11:00|     02:00|
+-----------+--------------+---------+---------+----------+