Amazon Athena在解析嵌套JSON时发出内部错误

时间:2017-04-26 13:12:37

标签: json amazon-athena

我试图查询此JSON文件(出于调试目的,它只包含一行!):

{
  "appVersion": null,
  "sessionIndex": "3",
  "psdkLang": null,
  "lamdbaAwsRequestId": "bb04330c-e1e7-4bbd-97b8-86fdb2ee0b7f",
  "bundleID": "xyz",
  "receiveTimestamp": "2017-03-31T01:45:30.796Z",
  "type": "logEvent",
  "userIdfv": null,
  "osVersion": null,
  "uniqueIndex": "9c6c3927-aa66-4974-adac-fd10fc83a1e5",
  "userIdfa": null,
  "eventName": "Rewarded Ads Ad Is Ready",
  "deviceType": null,
  "eventId": "shardId-000000000005:49571690399037302251611429510623174446442870333536993362",
  "store1": "google",
  "deviceLang": null,
  "geoCode": null,
  "sessionId": "34B4CEC8-9AA0-40DD-94C4-C5420F563F68",
  "params": "{\"AdProvider\":\"AdColony\",\"AdIsReady\":\"false\"}",
  "gameVersion": null,
  "internetConnectionState": null,
  "deviceModel": null,
  "deviceTimeZone": null,
  "time": "2017-03-31T10:44:50.117+0900",
  "userId": "24176983"
}

我在Amazon Athena创建了一个表:

CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
  `appversion` string,
  `psdklang` string,
  `bundleid` string,
  `receivetimestamp` string,
  `type` string,
  `osversion` string,
  `store1` string,
  `devicelang` string,
  `geocode` string,
  `sessionid` string,
  `eventName` string,
  `params` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'  
) LOCATION 's3://...'
TBLPROPERTIES ('has_encrypted_data'='false');

当我运行此查询时:
select eventname from RAAIR;
一切正常。

当我尝试使用嵌套的JSON(params元素)时:
select params['AdIsReady'] from RAAIR;
我得到一个&#34;内部错误&#34;消息。

我在这里缺少什么?

1 个答案:

答案 0 :(得分:1)

您在评论中提到params包含用于转义的反斜杠 这是因为params是一个字符串,而不是一个嵌套对象。 Athena无法直接从字符串创建MAP,因此您会收到“内部错误”消息。

如果您无法更改数据以将params作为嵌套对象,则可以更改表定义,以使params为字符串:

CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
  ...
  `params` string
)
...

Athena(Presto)将允许您解析字符串中的JSON并查询值 通过根据您的偏好解析,转换和提取值,至少有两种不同的方法:

SELECT
  CAST(json_parse(params) as MAP(varchar, varchar))['AdIsReady'] as AdIsReady1,
  json_extract_scalar(json_parse(params), '$.AdIsReady') as AdIsReady2
FROM RV_QA.RAAIR LIMIT 10;