我的输入文件在
下面a,t1,1000,100
a,t1,2000,200
a,t1,1000,500
b,t2,1000,200
b,t2,5000,100
这是我的剧本。这是投掷总和错误。你能纠正吗
myinput = LOAD 'file' USING PigStorage(',') AS(a1:chararray,a2:chararray,total:int,div:int)
for_disticnt = FOREACH myinput GENERATE a2;
grp_disticnt = GROUP for_distinct ALL;
disticnt_count=FOREACH grp_disticnt GENEARATE COUNT(for_disticnt) as finalcount;
grouped = GROUP myinput BY a1;
result = FOREACH grouped GENEARTE group,SUM(myinput.total/myinput.div)/distinct_count;
所以分组的输出是
((a),{(a,t1,1000,100),(a,t1,2000,200)})
((b),{(b,t2,1000,200),(b,t2,5000,100)})
我想在单个组的每个元组中将2美元除以$ 3,然后对其进行SUM,然后最后将该SUM除以不同的$ 1。
分组中每个行李的总和逻辑如下。
[(1000/100)+((2000/200)]/count(distinct $1 in myinput)
[(1000/200)+(5000/100)]/count(distinct $1 in myinput)
我想要输出如下
(a,10)
(b,27)
答案 0 :(得分:0)
myinput = load 'data' using PigStorage(',') as
(a1:chararray, a2:chararray, total:int, div:int);
sub = foreach myinput generate a2;
dist = DISTINCT sub;
grpd = group dist all;
X = foreach grpd generate COUNT_STAR(dist);
A = foreach myinput generate a1, (total / div) as quotient;
grouped = group A by a1;
B = foreach grouped generate group, SUM(A.quotient) as sums;
C = CROSS B, X;
final = foreach C generate $0, ((float)($1) / (float)($2));
<强>输出强>
(a,11.0)
(b,27.5)