我在Athena中添加了一个表,用于查询应用程序负载平衡器日志。我使用以下查询创建了表,然后根据s3中的数据存储添加了分区。但仍然无法通过查询获取所需的数据。
表创建查询:
CREATE EXTERNAL TABLE IF NOT EXISTS {{DATABASE_NAME.TABLE_NAME}} (
type string,
time string,
elb string,
client_ip string,
client_port string,
target string,
request_processing_time int,
target_processing_time int,
response_processing_time int,
elb_status_code int,
target_status_code string,
received_bytes int,
sent_bytes int,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) ([^ ]*)\" \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
) LOCATION 's3://{{BUCKET_NAME}}/AWSLogs/{{ACCOUNT_ID}}/elasticloadbalancing/us-west-2/';
分区查询:
ALTER TABLE alb_webapp add partition (year="2018", month="*", day="*")
location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
当我尝试运行一个简单的查询时说“选择”,这使我发现的结果为零。
我希望根据年份或月份进行分区。
答案 0 :(得分:1)
似乎AWS再次更改了其alb日志的格式。 AWS文档具有新的正则表达式以使用新的日志格式。下面的查询对我有用。
表创建:
CREATE EXTERNAL TABLE IF NOT EXISTS webapp_alb (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
new_field string
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\"($| \"[^ ]*\")(.*)')
LOCATION 's3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/';
添加分区:
ALTER TABLE webapp_alb add partition (year="2018", month="*", day="*") location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
参考:
https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html https://docs.aws.amazon.com/athena/latest/ug/partitions.html