在Pig中按组计算1和0

时间:2015-11-11 23:17:49

标签: hadoop hive apache-pig

如何计算每种类型事件的数量1和0?我在猪身上做了这一切,第二场只有1和0。 数据如下所示:

(pageLoad,1)
(pageLoad,0)
(pageLoad,1) 
(appLaunch,1)
(appLaunch,0)
(otherEvent,1) 
(otherEvent,0)
(event,1)
(event,1)
(event,0)
(somethingelse,0)

输出将是这样的

pageLoad 1:234 0:2359
appLaunch 1:54 0:111
event 1:345 0:0

type 1 0 
pageLoad 21 345
appLaunch 0 123
event 234 12

谢谢大家。

1 个答案:

答案 0 :(得分:1)

输入:

pageLoad,1
pageLoad,0
pageLoad,1 
appLaunch,1
appLaunch,0
otherEvent,1 
otherEvent,0
event,1
event,1
event,0
somethingelse,0

猪脚本:

A = LOAD 'input.csv'  USING  PigStorage(',') AS (event_type:chararray,status:int);
B = GROUP A BY event_type;
req = FOREACH B {
    event_type_1 = FILTER A BY status==1;
    event_type_0 = FILTER A BY status==0;
    GENERATE group AS event_type, COUNT(event_type_1) AS event_type_1_count, COUNT(event_type_0) AS event_type_0_count;
};  
DUMP req;

输出

(event,2,1)
(pageLoad,2,1)
(appLaunch,1,1)
(otherEvent,1,1)
(somethingelse,0,1)