说我有一些像
这样的数据1,A
1,A
1,B
2,C
2,D
3,E
3,E
我希望能够对第一列进行分组,然后返回该组中的不同值:
1,A,B
2,C,D
3,E
或
1,{A,B}
2,{C,D}
3,{E}
除了UDF之外还有办法做到这一点吗?
如果我这样做
DATA = LOAD 'data.txt' USING PigStorage(',') AS (a:int, b:chararray);
GROUPED = GROUP DATA BY a;
UNIQUES = FOREACH GROUPED {
distinct_bs = DISTINCT GROUPED.b;
GENERATE
group AS a
,FLATTEN(distinct_bs)
;
}
(无论是否有FLATTEN,或者如果我包含group as a
,我都会收到
ERROR 1200: org.apache.pig.newplan.logical.expression.ScalarExpression
cannot be cast to org.apache.pig.newplan.logical.expression.ProjectExpression
答案 0 :(得分:0)
GROUPED不包含b,但DATA包含:
DESCRIBE GROUPED
GROUPED: {group: int,DATA: {(a: int,b: chararray)}}
尝试以下方法:
UNIQUES = FOREACH GROUPED {
distinct_bs = DISTINCT DATA.b;
GENERATE
group AS a,
distinct_bs;
}
结果:
(1,{(A),(B)})
(2,{(C),(D)})
(3,{(E)})