Map Reduce Framework

时间:2014-09-28 05:34:14

标签: java

我有一个文字文件说....     这是阿帕奇猪,工作就像一个魅力。   因此,每次每个角色都在重复时,我想计算一下。     这应该打印...     T = T的计数     H = H的计数     A = A的计数     B = .........

Can anyone tell me how do I break my words into characters into Pig.
Any help would be greatly appreciated.

1 个答案:

答案 0 :(得分:1)

input.txt  
This is Apache pig,  
works like  
a charm  

PigScript:  
A = LOAD 'input.txt' AS line;  
B = FOREACH A GENERATE (REPLACE(line,'','\n')) AS (word:chararray);  
C = FOREACH B GENERATE FLATTEN(TOKENIZE(word,'\n'));  
D = GROUP C BY $0;  
E = FOREACH D GENERATE group,COUNT($1);  
DUMP E;  

Output:  
( ,6)  
(,,1)  
(A,1)  
(T,1)  
(a,3)
(c,2)
(e,2)
(g,1)
(h,3)
(i,4)
(k,2)
(l,1)
(m,1)
(o,1)
(p,2)
(r,2)
(s,3)
(w,1)