在编写PIG脚本以提取两个给定时间戳之间的日志行时需要帮助。
示例日志文件:
2016/08/17 09:00:00 This is log line 1<BR>
2016/08/17 09:05:00 This is log line 2<BR>
2016/08/17 10:00:00 This is log line 3<BR>
2016/08/18 09:00:00 This is log line 4<BR>
答案 0 :(得分:0)
假设数据是制表符分隔的,请将日志加载到两个字段中,并在第一个字段上使用过滤器。
A = LOAD 'log.txt' using PigStorage('\t') AS (dt:datetime,line:chararray);
B = FILTER A BY (dt > '2016/08/17 09:00:00' AND dt < '2016/08/18 09:00:00');
DUMP B;
使用参数
pig -f myscript.pig -param start_dt='2016/08/17 09:00:00' end_dt='2016/08/18 09:00:00'
A = LOAD 'log.txt' using PigStorage('\t') AS (dt:datetime,line:chararray);
B = FILTER A BY (dt > start_dt AND dt < end_dt);
DUMP B;