我有一些数据,如(姓名,得分) 一个10 B 25 C 15 A 5 一个36 B 98 C 78 C 78 B 12
data = LOAD 'demo.txt' using PigStorage (',') as (name : chararray , score : int);
groupScore = GROUP data by score;
totalscore = FOREACH groupScore Generate data.name , SUM(data.score);
当我使用SUM()函数时,输出就像
一样{(A)(A)(A), (51)}
{(B)(B)(B), (135)}
我想知道无论如何我都能表现出来像
{(A), (51)},
每次出现都不重复“名称”字段?
任何指导都会有所帮助。
答案 0 :(得分:3)
以下是解决方案的查询
data = LOAD 'demo.txt' USING PigStorage(',') AS (name:chararray,score:int);
groupScore = group data by name;
result= FOREACH groupScore GENERATE group,SUM(data.score);
输出
(A,51)(B,135)(C,171)
答案 1 :(得分:0)
按名称分组
data = LOAD 'demo.txt' as PigStorage (',') using (name : chararray , score : int);
groupScore = GROUP data by name;
totalscore = FOREACH groupScore Generate data.name , SUM(data.score);