AWS Athena - 查询JSON - 搜索值

时间:2018-01-22 12:33:45

标签: json amazon-web-services amazon-s3 amazon-athena

我在S3上嵌套了JSON文件,并试图用Athena查询它们。

但是,我在查询嵌套的JSON值时遇到问题。

我的JSON文件如下所示:

{
"id": "17842007980192959",
"acount_id": "17841401243773780",
"stats": [
{
"name": "engagement",
"period": "lifetime",
"values": [
{
"value": 374
}
],
"title": "Engagement",
"description": "Total number of likes and comments on the media object",
"id": "17842007980192959\/insights\/engagement\/lifetime"
},
{
"name": "impressions",
"period": "lifetime",
"values": [
{
"value": 11125
}
],
"title": "Impressions",
"description": "Total number of times the media object has been seen",
"id": "17842007980192959\/insights\/impressions\/lifetime"
},
{
"name": "reach",
"period": "lifetime",
"values": [
{
"value": 8223
}
],
"title": "Reach",
"description": "Total number of unique accounts that have seen the media object",
"id": "17842007980192959\/insights\/reach\/lifetime"
},
{
"name": "saved",
"period": "lifetime",
"values": [
{
"value": 0
}
],
"title": "Saved",
"description": "Total number of unique accounts that have saved the media object",
"id": "17842007980192959\/insights\/saved\/lifetime"
}
],
"import_date": "2017-12-04"
}

我要做的是查询名称=展示次数的“统计信息”字段值。

理想情况如下:

SELECT id, account_id, stats.values.value WHERE stats.name='engagement'

AWS示例:https://docs.aws.amazon.com/athena/latest/ug/searching-for-values.html

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:1)

您可以使用以下表定义查询JSON:

CREATE EXTERNAL TABLE test(
id string,
acount_id string,
stats array<
  struct<
     name:string,
     period:string,
     values:array<
         struct<value:string>>,
     title:string
  >
 >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket/';

现在,value列可通过以下排除方式获得:

select id, acount_id, stat.name,x.value
from test
cross join UNNEST(test.stats) as st(stat)
cross join UNNEST(stat."values") as valx(x)
WHERE stat.name='engagement';