猪脚本解析aws elb日志

时间:2016-09-29 16:33:35

标签: hadoop apache-pig hadoop2

我正在尝试用猪解析这个elb日志,我可以使用这个脚本成功解析它

+++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++     2016-07-16T00:00:41.700161Z testelb 11.11.17.2:50883 192.168.1.94:80 0.00002 0.001392 0.000019 200 200 0 43" GET http://test.example.com:80/bac?aid=b5cf542d74&cid=etrsewtp&bid=23c45c543&dte=Sat%20Jul%2016%202016%2008:00:41%20GMT+0800%20(HKT) HTTP / 1.1" " Mozilla / 5.0(iPhone; CPU iPhone OS 9_3_2,如Mac OS X)AppleWebKit / 601.1.46(KHTML,如Gecko)Mobile / 13F69" - - ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++

***************************************************************
A = LOAD '/tmp/one.log' USING TextLoader AS (line:chararray);

B = FOREACH A GENERATE FLATTEN (
    REGEX_EXTRACT_ALL(
            line,'^(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) "(.+?)" "(.+?)" (\\S+) (\\S+)')
    ) AS (
    timestamp:chararray, elb:int, client_port:chararray, backend_port:chararray, request_processing_time:float, backend_processing_time:float, response_processing_time:float, elb_status_code:int, backend_status_code:int, received_bytes:int, sent_bytes:int, request:chararray, user_agent:chararray, ssl_cipher:chararray, ssl_protocol:chararray
);

DUMP B;

现在我想提取请求网址,援助,出价,cid等但无法匹配正则表达式。有人可以帮助我获取这些细节吗?

除了上面的正则表达式方法,如果有任何其他方法来获取完整的elb日志详细信息,那么我想知道。

注意:援助,出价和cid的位置未在请求日志中修复。

1 个答案:

答案 0 :(得分:1)

您的问题已经回答here

Alternate way to do the same task需要自定义加载程序。