在AWS Athena中查询嵌套的JSON结构

时间:2018-11-15 08:02:06

标签: amazon-athena presto

我得到了以下具有嵌套结构的JSON文档格式

{
    "id": "p-1234-2132321-213213213-12312",
    "name": "athena to the rescue",
    "groups": [
        {
            "strategy_group": "anyOf",
            "conditions": [
                {
                    "strategy_conditions": "anyOf",
                    "entries": [
                        {
                            "c_key": "service",
                            "C_operation": "isOneOf",
                            "C_value": "mambo,bambo,jumbo"
                        },
                        {
                            "c_key": "hostname",
                            "C_operation": "is",
                            "C_value": "lols"
                        }
                    ]
                }
            ]
        }
    ],
    "tags": [
        "aaa",
        "bbb",
        "ccc"
    ]
}

我已经在雅典娜中创建了表格,以使用以下内容支持

CREATE EXTERNAL TABLE IF NOT EXISTS filters ( id string, name string, tags array<string>, groups array<struct<
    strategy_group:string,
    conditions:array<struct<
        strategy_conditions:string,
        entries: array<struct<
            c_key:string,
            c_operation:string,
            c_value:string
        >>
    >>
>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 's3://filterios/policies/';

我目前的目标是也根据条件条目列进行查询。我已经尝试了一些查询,但是sql语言不是我最大的交易;)

此刻我得到了这个查询,该查询为我提供了

select cnds.entries from 
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds)

但是,由于这是一个复杂的数组,它使我有些头疼,这是查询的正确方法。

任何提示表示赞赏!

谢谢 R

2 个答案:

答案 0 :(得分:0)

我不确定我是否理解您的查询。看下面的这个例子,也许对您有用。

select id, name, tags,
grps.strategy_group,
cnds.strategy_conditions,
enes.c_key,enes.c_operation, enes.c_value from 
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds),
UNNEST(cnds.entries) AS t(enes)
where enes.c_key='service'

答案 1 :(得分:0)

以下是我最近处理过的一个示例,可能会有所帮助:

我的JSON:

{
"type": "FeatureCollection",
"features": [{
    "first": "raj",
    "geometry": {
        "type": "Point",
        "coordinates": [-117.06861096, 32.57889962]
    },
    "properties": "someprop"
}] 
}

创建的外部表:

CREATE EXTERNAL TABLE `jsondata`(
  `type` string COMMENT 'from deserializer', 
  `features` array<struct<type:string,geometry:struct<type:string,coordinates:array<string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='features,type') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://vicinitycheck/rawData/jsondata/'
TBLPROPERTIES (
  'classification'='json')

查询数据:

SELECT type AS TypeEvent,
     features[1].geometry.coordinates AS FeatherType
FROM test_vicinitycheck.jsondata
WHERE type = 'FeatureCollection'

test_vicinitycheck-我的数据库名称在Athena中吗
jsondata-雅典娜中的表名

如果有帮助,我在博客上记录了一些示例: http://weavetoconnect.com/aws-athena-and-nested-json/