AWS WAF 日志分区投影:不在 Metastore 中的分区

时间:2021-06-11 03:24:26

标签: amazon-athena amazon-waf

我正在尝试使用分区投影在 Athena 中设置一个表。 我的日志格式为 s3://bucket/folder/year/month/day/hour 和然后里面有一个 json 文件。

我尝试使用分区投影创建表,如下所示:

CREATE EXTERNAL TABLE `waf_logs_webacl1`(
  `timestamp` bigint,
  `formatversion` int,
  `webaclid` string,
  `terminatingruleid` string,
  `terminatingruletype` string,
  `action` string,
  `terminatingrulematchdetails` array<
                                  struct<
                                    conditiontype:string,
                                    location:string,
                                    matcheddata:array<string>
                                        >
                                     >,
  `httpsourcename` string,
  `httpsourceid` string,
  `rulegrouplist` array<
                     struct<
                        rulegroupid:string,
                        terminatingrule:struct<
                           ruleid:string,
                           action:string,
                           rulematchdetails:string
                                               >,
                        nonterminatingmatchingrules:array<
                                                       struct<
                                                          ruleid:string,
                                                          action:string,
                                                          rulematchdetails:array<
                                                               struct<
                                                                  conditiontype:string,
                                                                  location:string,
                                                                  matcheddata:array<string>
                                                                     >
                                                                  >
                                                               >
                                                            >,
                        excludedrules:array<
                                         struct<
                                            ruleid:string,
                                            exclusiontype:string
                                               >
                                            >
                           >
                       >,
  `ratebasedrulelist` array<
                        struct<
                          ratebasedruleid:string,
                          limitkey:string,
                          maxrateallowed:int
                              >
                           >,
  `nonterminatingmatchingrules` array<
                                  struct<
                                    ruleid:string,
                                    action:string
                                        >
                                     >,
  `requestheadersinserted` string,
  `responsecodesent` string,
  `httprequest` struct<
                      clientip:string,
                      country:string,
                      headers:array<
                                struct<
                                  name:string,
                                  value:string
                                      >
                                   >,
                      uri:string,
                      args:string,
                      httpversion:string,
                      httpmethod:string,
                      requestid:string
                      >,
  `labels` array<
             struct<
               name:string
                   >
                  >
)
PARTITIONED BY
(
 day STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://bucket/folder/'
TBLPROPERTIES
(
 "projection.enabled" = "true",
 "projection.day.type" = "date",
 "projection.day.range" = "2021/01/01,NOW",
 "projection.day.format" = "yyyy/MM/dd/HH",
 "projection.day.interval" = "1",
 "projection.day.interval.unit" = "YEARS",
 "storage.location.template" = "s3://bucket/folder/${year}/${month}/${day}/${hour}/"
)

它已成功创建,但是当我加载其中的所有分区时,出现错误

Partitions not in metastore:    waf_logs_webacl1:2021/05/16/23  waf_logs_webacl1:2021/05/17/00  waf_logs_webacl1:2021/05/17/01  waf_logs_webacl1:2021/05/17/02  waf_logs_webacl1:2021/05/17/03 etc

我也试过将 storage.location.template 设为 s3://bucket/folder/s3://bucket/folder/${year}/ 并在加载分区时遇到相同的错误。请帮忙谢谢。

1 个答案:

答案 0 :(得分:0)

当你使用分区投影时,你不需要加载分区,分区会在查询执行时找到。

您的表的问题在于您有一个分区键 day,但您对 Athena 说数据存储在包含 /${year}/${month}/${day}/${hour}/ 的目录结构中,即四个分区键。

>

要么您需要使用所有四个分区键创建表并为其配置分区投影(例如 projection.year.type 等),要么您需要从存储位置模板中删除未定义的键。

我认为正确的做法是前者,因为这就是数据的组织方式。 Athena 文档中有一个示例,您应该可以在此处用作起点:https://docs.aws.amazon.com/athena/latest/ug/partition-projection-kinesis-firehose-example.html

相关问题