如何使用PIG脚本获取两个Epoch时间值之间的MilliSeconds

时间:2017-02-11 07:22:38

标签: hadoop apache-pig epoch

Game_ID | BeginTime |结束时间

1 | 1235000140 | 1235002457个
 2 | 1235000377 | 1235003300个
 3 | 1235000414 | 1235056128个
 1 | 1235000414 | 1235056128个
 2 | 1235000377 | 1235003300个

这里我想得到两个纪元时间字段BeginTime和EndTime之间的毫秒数。然后计算每场比赛的平均时间。

1 个答案:

答案 0 :(得分:1)

games = load 'games.txt' using PigStorage('|') as (gameid: int, begin_time: long, end_time:long);

dump games; 
(1,1235000140,1235002457)
(2,1235000377,1235003300)
(3,1235000414,1235056128)
(1,1235000414,1235056128)
(2,1235000377,1235003300)

第1步:计算时差

difference = foreach games generate gameid, end_time - begin_time as time_lapse;

dump difference;
(1,2317)
(2,2923)
(3,55714)
(1,55714)
(2,2923)

第2步:将数据分组到Game_ID

game_group = group difference by gameid;

dump game_group;
(1,{(1,55714),(1,2317)})
(2,{(2,2923),(2,2923)})
(3,{(3,55714)})

第3步:然后是平均值

average = foreach game_group generate group, AVG(difference.time_lapse);

dump average;
(1,29015.5)
(2,2923.0)
(3,55714.0)