使用Athena从AWS WAF日志中的rulegrouplist获取终止规则

时间:2020-07-23 13:40:29

标签: sql amazon-athena amazon-waf

我跟随these instructions将我的AWS WAF数据放入Athena表中。

我想查询数据以使用BLOCK动作查找最新请求。此查询有效:

SELECT
  from_unixtime(timestamp / 1000e0) AS date,
  action,
  httprequest.clientip AS ip,
  httprequest.uri AS request,
  httprequest.country as country,
  terminatingruleid,
  rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;

我的问题是明确标识“终止规则”-请求被阻止的原因。例如,结果有

terminatingrule = AWS-AWSManagedRulesCommonRuleSet

rulegrouplist = [
  {
    "nonterminatingmatchingrules": [],
    "rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
    "terminatingrule": "null",
    "excludedrules": "null"
  },
  {
    "nonterminatingmatchingrules": [],
    "rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
    "terminatingrule": "null",
    "excludedrules": "null"
  },
  {
    "nonterminatingmatchingrules": [],
    "rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
    "terminatingrule": "null",
    "excludedrules": "null"
  },
  {
    "nonterminatingmatchingrules": [],
    "rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
    "terminatingrule": {
      "rulematchdetails": "null",
      "action": "BLOCK",
      "ruleid": "NoUserAgent_HEADER"
    },
    "excludedrules":"null"
  }
]

我想分成一列的数据是rulegrouplist[terminatingrule].ruleid,其值为NoUserAgent_HEADER

AWS提供了useful information on querying nested Athena arrays,但我一直无法获得想要的结果。

我将其视为一个AWS问题,但由于Athena使用SQL查询,因此具有良好SQL技能的任何人都有可能解决此问题。

2 个答案:

答案 0 :(得分:2)

我尚不清楚您到底想要什么,但我将假设您位于terminatingrule不是"null"的数组元素之后(我还将假设如果有多个您想要第一个)。

您链接的文档说rulegrouplist列的类型为array<string>。之所以是string而不是复杂的类型,是因为此列似乎存在多个不同的模式,一个示例是terminatingrule属性是 string "null"或结构/对象-使用Athena的类型系统无法描述的内容。

但是,这不是问题。处理JSON时,可以使用整套JSON functions。这是将json_extractfilterelement_at结合使用的一种方法,以删除其中terminatingrule属性为字符串“ null”的数组元素,然后从其余元素中选择第一个:

SELECT
  element_at(
    filter(
      rulegrouplist,
      rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
    ),
    1
  ) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC

您说您想要“最新的”,对我来说这是模棱两可的,可能意味着第一个非空元素和最后一个非空元素。上面的查询将返回第一个非null元素,如果需要最后一个,则可以将第二个参数element_at更改为-1(Athena的数组索引从1开始,而-1从结尾开始计数)

要返回json的单个ruleid元素:

SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)  ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC

答案 1 :(得分:1)

我遇到了同样的问题,但 Theo 发布的解决方案对我不起作用,即使该表是根据原始帖子中链接的说明创建的。

这是对我有用的方法,它与 Theo 的解决方案基本相同,但没有 json 转换:

SELECT
  from_unixtime(timestamp / 1000e0) AS date,
  action,
  httprequest.clientip AS ip,
  httprequest.uri AS request,
  httprequest.country as country,
  terminatingruleid,
  rulegrouplist
  element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
相关问题