我试图查询此JSON文件(出于调试目的,它只包含一行!):
{
"appVersion": null,
"sessionIndex": "3",
"psdkLang": null,
"lamdbaAwsRequestId": "bb04330c-e1e7-4bbd-97b8-86fdb2ee0b7f",
"bundleID": "xyz",
"receiveTimestamp": "2017-03-31T01:45:30.796Z",
"type": "logEvent",
"userIdfv": null,
"osVersion": null,
"uniqueIndex": "9c6c3927-aa66-4974-adac-fd10fc83a1e5",
"userIdfa": null,
"eventName": "Rewarded Ads Ad Is Ready",
"deviceType": null,
"eventId": "shardId-000000000005:49571690399037302251611429510623174446442870333536993362",
"store1": "google",
"deviceLang": null,
"geoCode": null,
"sessionId": "34B4CEC8-9AA0-40DD-94C4-C5420F563F68",
"params": "{\"AdProvider\":\"AdColony\",\"AdIsReady\":\"false\"}",
"gameVersion": null,
"internetConnectionState": null,
"deviceModel": null,
"deviceTimeZone": null,
"time": "2017-03-31T10:44:50.117+0900",
"userId": "24176983"
}
我在Amazon Athena创建了一个表:
CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
`appversion` string,
`psdklang` string,
`bundleid` string,
`receivetimestamp` string,
`type` string,
`osversion` string,
`store1` string,
`devicelang` string,
`geocode` string,
`sessionid` string,
`eventName` string,
`params` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://...'
TBLPROPERTIES ('has_encrypted_data'='false');
当我运行此查询时:
select eventname from RAAIR;
一切正常。
当我尝试使用嵌套的JSON(params元素)时:
select params['AdIsReady'] from RAAIR;
我得到一个&#34;内部错误&#34;消息。
我在这里缺少什么?
答案 0 :(得分:1)
您在评论中提到params
包含用于转义的反斜杠
这是因为params
是一个字符串,而不是一个嵌套对象。 Athena无法直接从字符串创建MAP,因此您会收到“内部错误”消息。
如果您无法更改数据以将params作为嵌套对象,则可以更改表定义,以使params
为字符串:
CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
...
`params` string
)
...
Athena(Presto)将允许您解析字符串中的JSON并查询值 通过根据您的偏好解析,转换和提取值,至少有两种不同的方法:
SELECT
CAST(json_parse(params) as MAP(varchar, varchar))['AdIsReady'] as AdIsReady1,
json_extract_scalar(json_parse(params), '$.AdIsReady') as AdIsReady2
FROM RV_QA.RAAIR LIMIT 10;