我跟随these instructions将我的AWS WAF数据放入Athena表中。
我想查询数据以使用BLOCK动作查找最新请求。此查询有效:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
我的问题是明确标识“终止规则”-请求被阻止的原因。例如,结果有
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
和
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
我想分成一列的数据是rulegrouplist[terminatingrule].ruleid
,其值为NoUserAgent_HEADER
AWS提供了useful information on querying nested Athena arrays,但我一直无法获得想要的结果。
我将其视为一个AWS问题,但由于Athena使用SQL查询,因此具有良好SQL技能的任何人都有可能解决此问题。
答案 0 :(得分:2)
我尚不清楚您到底想要什么,但我将假设您位于terminatingrule
不是"null"
的数组元素之后(我还将假设如果有多个您想要第一个)。
您链接的文档说rulegrouplist
列的类型为array<string>
。之所以是string
而不是复杂的类型,是因为此列似乎存在多个不同的模式,一个示例是terminatingrule
属性是 string "null"
或结构/对象-使用Athena的类型系统无法描述的内容。
但是,这不是问题。处理JSON时,可以使用整套JSON functions。这是将json_extract
与filter
和element_at
结合使用的一种方法,以删除其中terminatingrule
属性为字符串“ null”的数组元素,然后从其余元素中选择第一个:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
您说您想要“最新的”,对我来说这是模棱两可的,可能意味着第一个非空元素和最后一个非空元素。上面的查询将返回第一个非null元素,如果需要最后一个,则可以将第二个参数element_at
更改为-1(Athena的数组索引从1开始,而-1从结尾开始计数)
要返回json的单个ruleid元素:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
答案 1 :(得分:1)
我遇到了同样的问题,但 Theo 发布的解决方案对我不起作用,即使该表是根据原始帖子中链接的说明创建的。
这是对我有用的方法,它与 Theo 的解决方案基本相同,但没有 json 转换:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;