将Spark中的嵌套数据结构展平

时间:2019-06-18 07:48:20

标签: json scala apache-spark

我有以下数据框:

 df.show()
+--------------------+--------------------+----+--------+---------+--------------------+--------+--------------------+
|             address|         coordinates|  id|latitude|longitude|                name|position|                json|
+--------------------+--------------------+----+--------+---------+--------------------+--------+--------------------+
|Balfour St / Brun...|[-27.463431, 15.352472|79.0|    null|     null|79 - BALFOUR ST /...|    null|[-27.463431, 153.041031]|
+--------------------+--------------------+----+--------+---------+--------------------+--------+--------------------+

我想弄平json列。 我做到了:

val jsonSchema  = StructType(Seq(
StructField("latitude", DoubleType, nullable = true),
StructField("longitude", DoubleType, nullable = true)))

val a = df.select(from_json(col("json"), jsonSchema) as "content")

但是

a.show() gives me :
+-------+
|content|
+-------+
|   null|
+-------+

任何想法如何正确解析json col并在第二个数据帧(a)中获取内容col不为null?

原始数据显示为:

{
    "id": 79,
    "name": "79 - BALFOUR ST / BRUNSWICK ST",
    "address": "Balfour St / Brunswick St",
    "coordinates": {
      "latitude": -27.463431,
      "longitude": 153.041031
    }
  }

非常感谢

1 个答案:

答案 0 :(得分:0)

问题是您的架构。您正在尝试访问嵌套集合值,例如常规值。我对您的架构进行了更改,它对我有用。

val df = spark.createDataset(
  """
    |{
    |    "id": 79,
    |    "name": "79 - BALFOUR ST / BRUNSWICK ST",
    |    "address": "Balfour St / Brunswick St",
    |    "coordinates": {
    |      "latitude": -27.463431,
    |      "longitude": 153.041031
    |    }
    |  }
  """.stripMargin :: Nil)

val jsonSchema = StructType(Seq(
  StructField("name", StringType, nullable = true),

  StructField("coordinates",
    StructType(Seq(
      StructField("latitude", DoubleType, true)
      ,
      StructField("longitude", DoubleType, true)

    )), true)

)

)
val a = df.select(from_json(col("value"), jsonSchema) as "content")

a.show(false)

输出

+--------------------------------------------------------+
|content                                                 |
+--------------------------------------------------------+
|[79 - BALFOUR ST / BRUNSWICK ST,[-27.463431,153.041031]]|
+--------------------------------------------------------+