我有一个传感器输出数据文件:
category <tab> instance <space> instance2 <space> ... instanceN
category2 <tab> instanceX <space> instanceY <space> ... instanceZ
现在,对于每个实例,我需要计算有多少类别具有此特定实例。 我是猪的新手,任何人都可以建议,我应该怎么解决这个问题?
答案 0 :(得分:0)
input.txt
category instance instance2 instanceN
category1 instanceX instanceY instanceZ
category2 instance instanceY
PigScript:
A = LOAD 'input.txt' USING PigStorage() AS (category:chararray,instances:chararray);
B = FOREACH A GENERATE category,FLATTEN(TOKENIZE(instances,' '));
C = GROUP B BY $1;
D = FOREACH C GENERATE group,COUNT($1);
DUMP D;
Output:
(instance,2)
(instance2,1)
(instanceN,1)
(instanceX,1)
(instanceY,2)
(instanceZ,1)