在Athena分区查询雅典娜表时出现问题

时间:2018-09-17 08:43:57

标签: amazon-web-services amazon-elb

我在Athena中添加了一个表,用于查询应用程序负载平衡器日志。我使用以下查询创建了表,然后根据s3中的数据存储添加了分区。但仍然无法通过查询获取所需的数据。

表创建查询:

CREATE EXTERNAL TABLE IF NOT EXISTS {{DATABASE_NAME.TABLE_NAME}} (
  type string,
  time string,
  elb string,
  client_ip string,
  client_port string,
  target string,
  request_processing_time int,
  target_processing_time int,
  response_processing_time int,
  elb_status_code int,
  target_status_code string,
  received_bytes int,
  sent_bytes int,
  request_verb string,
  request_url string,
  request_proto string,
  user_agent string,
  ssl_cipher string,
  ssl_protocol string,
  target_group_arn string,
  trace_id string
)
PARTITIONED BY(year string, month string, day string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) ([^ ]*)\" \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
) LOCATION 's3://{{BUCKET_NAME}}/AWSLogs/{{ACCOUNT_ID}}/elasticloadbalancing/us-west-2/';

分区查询:

ALTER TABLE alb_webapp add partition (year="2018", month="*", day="*")
location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";

当我尝试运行一个简单的查询时说“选择”,这使我发现的结果为零。

我希望根据年份或月份进行分区。

1 个答案:

答案 0 :(得分:1)

似乎AWS再次更改了其alb日志的格式。 AWS文档具有新的正则表达式以使用新的日志格式。下面的查询对我有用。

表创建:

CREATE EXTERNAL TABLE IF NOT EXISTS webapp_alb (
    type string,
    time string,
    elb string,
    client_ip string,
    client_port int,
    target_ip string,
    target_port int,
    request_processing_time double,
    target_processing_time double,
    response_processing_time double,
    elb_status_code string,
    target_status_code string,
    received_bytes bigint,
    sent_bytes bigint,
    request_verb string,
    request_url string,
    request_proto string,
    user_agent string,
    ssl_cipher string,
    ssl_protocol string,
    target_group_arn string,
    trace_id string,
    domain_name string,
    chosen_cert_arn string,
    matched_rule_priority string,
    request_creation_time string,
    actions_executed string,
    redirect_url string,
    new_field string
    )
    PARTITIONED BY(year string, month string, day string) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    'serialization.format' = '1',
    'input.regex' = 
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\"($| \"[^ ]*\")(.*)')
    LOCATION 's3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/';

添加分区:

ALTER TABLE webapp_alb add partition (year="2018", month="*", day="*") location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";

参考:

  

https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html   https://docs.aws.amazon.com/athena/latest/ug/partitions.html