RDD中的每条记录都包含一个json。我正在使用SQLContext从Json创建一个DataFrame,如下所示:
val signalsJsonRdd = sqlContext.jsonRDD(signalsJson)
以下是架构。 datapayload是一个项目数组。我想爆炸项目数组以获取数据框,其中每一行都是datapayload中的项目。我尝试根据this回答做一些事情,但似乎我需要在案例行(arr:Array [...] )中对项目的整个结构进行建模声明。我可能错过了一些东西。
val payloadDfs = signalsJsonRdd.explode($"data.datapayload"){
case org.apache.spark.sql.Row(arr: Array[String]) => arr.map(Tuple1(_))
}
上面的代码抛出了一个scala.MatchError,因为实际Row的类型与Row(arr:Array [String])非常不同。可能有一种简单的方法可以做我想要的,但我找不到它。请帮忙。
下面的架构
signalsJsonRdd.printSchema()
root
|-- _corrupt_record: string (nullable = true)
|-- data: struct (nullable = true)
| |-- dataid: string (nullable = true)
| |-- datapayload: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- Reading: struct (nullable = true)
| | | | |-- A2DPActive: boolean (nullable = true)
| | | | |-- Accuracy: double (nullable = true)
| | | | |-- Active: boolean (nullable = true)
| | | | |-- Address: string (nullable = true)
| | | | |-- Charging: boolean (nullable = true)
| | | | |-- Connected: boolean (nullable = true)
| | | | |-- DeviceName: string (nullable = true)
| | | | |-- Guid: string (nullable = true)
| | | | |-- HandsFree: boolean (nullable = true)
| | | | |-- Header: double (nullable = true)
| | | | |-- Heading: double (nullable = true)
| | | | |-- Latitude: double (nullable = true)
| | | | |-- Longitude: double (nullable = true)
| | | | |-- PositionSource: long (nullable = true)
| | | | |-- Present: boolean (nullable = true)
| | | | |-- Radius: double (nullable = true)
| | | | |-- SSID: string (nullable = true)
| | | | |-- SSIDLength: long (nullable = true)
| | | | |-- SpeedInKmh: double (nullable = true)
| | | | |-- State: string (nullable = true)
| | | | |-- Time: string (nullable = true)
| | | | |-- Type: string (nullable = true)
| | | |-- Time: string (nullable = true)
| | | |-- Type: string (nullable = true)
答案 0 :(得分:3)
tl; dr explode
函数是您的朋友(或我最喜欢的flatMap
)。
explode
函数为给定数组或映射列中的每个元素创建一个新行。
以下内容应该有效:
signalsJsonRdd.withColumn("element", explode($"data.datapayload"))
请参阅functions对象。