使用PIG REGEX_EXTRACT_ALL解析Apache日志时获取空白括号作为输出

时间:2014-07-22 19:54:21

标签: regex hadoop apache-pig

这是输入日志样本

 122.161.182.200    - Joe [21/Jul/2009:13:14:17 -0700] "GET /rss.pl HTTP/1.1" 
 200 35942 "-" "IE/4.0 (compatible; MSIE 7.0; Windows NT 6.0; 
 Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; InfoPath.2; 
 .NET CLR 3.5.30729; .NET CLR 3.0.30618; OfficeLiveConnector.1.3; 
 OfficeLivePatch.1.3; MSOffice 12)"

Pig脚本是

raw_logs = LOAD 'apacheLog.log' USING TextLoader AS (line:chararray);
logs_base = FOREACH raw_logs GENERATE FLATTEN (REGEX_EXTRACT_ALL
(line,'^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" 
(\\d{3}) (\\d+)"([^"]+)" "([^"]+)"') ) AS (remoteAddr: chararray, 
remoteLogname: chararray, user: chararray,  time:
chararray, request: chararray, status: int, bytes_string: chararray,
referrer: chararray, browser: chararray);
dump logs_base;

当我将log_base转储为空白括号作为输出时

OUTPUT     ()     ()     ()     ()

请帮忙!提前谢谢。

0 个答案:

没有答案