计算Pig中类别的实例编号

时间:2014-10-26 20:56:39

标签: apache-pig

我有一个传感器输出数据文件:

category   <tab> instance <space> instance2 <space> ... instanceN
category2  <tab> instanceX <space> instanceY <space> ... instanceZ

现在,对于每个实例,我需要计算有多少类别具有此特定实例。 我是猪的新手,任何人都可以建议,我应该怎么解决这个问题?

1 个答案:

答案 0 :(得分:0)

你可以尝试一下吗?

input.txt
category        instance instance2 instanceN
category1       instanceX instanceY instanceZ
category2       instance instanceY

PigScript:
A = LOAD 'input.txt' USING PigStorage() AS (category:chararray,instances:chararray);
B = FOREACH A GENERATE category,FLATTEN(TOKENIZE(instances,' '));
C = GROUP B BY $1;
D = FOREACH C GENERATE group,COUNT($1);
DUMP D;

Output:
(instance,2)
(instance2,1)
(instanceN,1)
(instanceX,1)
(instanceY,2)
(instanceZ,1)