Question

摘要

我正在尝试创建一个数据库表，以便能够查询从AWS Elastic Beanstalk环境轮换的日志。运行用于创建表的查询时，没有结果输入到表中，因此无法执行（任何值的）查询。

详细信息

正在运行的特定查询是：

CREATE EXTERNAL TABLE IF NOT EXISTS access_logs (
         request_timestamp string,
         elb_name string,
         request_ip string,
         request_port int,
         backend_ip string,
         backend_port int,
         request_processing_time double,
         backend_processing_time double,
         response_processing_time double,
         elb_response_code string,
         backend_response_code string,
         received_bytes bigint,
         sent_bytes bigint,
         request_verb string,
         url string,
         protocol string,
         user_agent string,
         ssl_cipher string,
         ssl_protocol string 
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' ) 
LOCATION 's3://elasticbeanstalk-us-east-1-000000000000/resources/environments/logs/publish/e-0x0x0x0x0x0/i-0f0101010100101/';

上述查询中唯一的更改是混淆了S3存储桶位置。查询本身基于AWS文档（https://docs.aws.amazon.com/athena/latest/ug/elasticloadbalancer-classic-logs.html）。我知道这不是经典的负载均衡器，但这与所提供的示例最接近。

运行查询时，它会快速完成并返回：

(Run time: 0.39 seconds, Data scanned: 0 KB)

很显然，它找不到要导入的正确数据。我已经多次检查了S3存储桶的位置，甚至剪切和粘贴了字符以防止输入错误。

S3日志详细信息

当EBS旋转日志时，它们每小时都会旋转一次。访问日志已压缩，格式为s3://elasticbeanstalk-us-east-1-000000000000/resources/environments/logs/publish/e-0x0x0x0x0x0/i-0f0101010100101/_var_log_httpd_rotated_access_log-1541120461.gz

这意味着S3中的此特定目录具有数月的小型访问日志片段。这些是我尝试使用AWS Athena读取的内容，但显然创建查询的格式不正确，无法读取这些文件。

文件中的数据是相当标准的HTTP访问日志。样本数据条目：

172.31.33.100 (45.249.01.01) - - [05/Oct/2018:07:46:50 +0000] "POST /v1/process/contact HTTP/1.1" 400 51 "https://example.com/page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"

问题：

这可能是权限问题吗？ 自查询成功完成以来，似乎不太可能，但我不确定。
这与S3存储桶中的数据结构有关吗？ 由于这是一个自动化的AWS流程，因此我无法控制数据的格式，但是查询中是否可能缺少数据格式化规范？
还有其他事情吗？ 作为Athena的新手，可能会有一些我没有看到的配置/细节。

谢谢您的阅读，我们将不胜感激。请让我知道是否可以提供其他信息。

AWS Athena为Elastic Beanstalk旋转日志创建数据库

0 个答案: