在集群类型"社区优化,Spark 2.1(自动更新,Scala 2.11)"上使用带有Spark SQL的Databricks笔记本。 (即免费社区帐户),我为表response_raw
定义了以下架构:
root
|-- TrackingLabel: string (nullable = true)
|-- Json: string (nullable = true)
TrackingLabel
列包含一个如下所示的值:
000009FE-AA24-E511-8766-00505692010A
Json
列包含一个如下所示的值:
{
"transaction": {
...
"generatedInMillis": 24
},
"request": {
"trackingLabel": "4f10225b-078d-11e7-8a18-020a01f809ab",
...
},
"response": {
"locations": [
{
"trackingLabel": "0",
...
"inputDelta": {
"grade": {
"letter": "B-",
"number": 80,
...
}
}
}
]
}
}
我定义了一个新的Notebook单元格:
drop table if exists response_json;
create table response_json as
SELECT rr.TrackingLabel
, v1.transaction as TransactionJson
, v1.request as RequestJson
, v1.response as ResponseJson
FROM response_raw AS rr
LATERAL VIEW json_tuple(rr.json, 'transaction', 'request', 'response') v1 AS transaction, request, response
执行单元格后,架构显示为:
root
|-- TrackingLabel: string (nullable = true)
|-- TransactionJson: string (nullable = true)
|-- RequestJson: string (nullable = true)
|-- ResponseJson: string (nullable = true)
然后我用这个创建一个新的Notebook单元格:
drop table if exists response_json_locations;
create table response_json_locations as
SELECT rj.TrackingLabel
, v1.locations
FROM response_json AS rj
LATERAL VIEW json_tuple(rj.ResponseJson, 'locations') v1 AS locations
执行单元格后,架构显示为:
root
|-- TrackingLabel: string (nullable = true)
|-- locations: string (nullable = true)
然后我用这个定义了一个新的Notebook单元格:
SELECT locationx.trackingLabel
FROM response_json_locations AS rjl
LATERAL VIEW explode(rjl.locations) l AS locationx
当我执行此单元格时,收到以下错误:
Error in SQL statement: AnalysisException: cannot resolve 'explode(rjl.`locations`)' due to data type mismatch: input to function explode should be array or map type, not StringType; line 3 pos 2;
'Project ['locationx.trackingLabel]
+- 'Generate explode(locations#6127), true, false, l, ['locationx]
+- SubqueryAlias rjl
+- SubqueryAlias response_json_locations
+- Relation[TrackingLabel#6126,locations#6127] parquet
我已经编写并重写了最后一个单元格十几次,试图弄清楚如何访问JSON数组。我需要访问位置JSON数组的每个元素内的inputDelta.grade.letter和inputDelta.grade.number值。我必须在如何explode
response.locations中遗漏一些相当简单的东西。您对此提供的任何指导都非常感谢。