示例数据:
load data - 1
id name view
1 A 4
2 B 5
3 C 6
load data - 2
id name view
1 A 4
2 B 5
4 D 6
我想要输出:
output
id name view
1 A 8
2 B 10
3 C 6
4 D 6
我的猪代码:
inputdata = LOAD '/user/hdfs/tes/part-1' AS (
id:chararray,
nama:chararray,
view:int
);
inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
id:chararray,
nama:chararray,
view:int
);
x = UNION inputdata, inputdata2;
dump x;
如何在示例数据中对视图2加载文件求和。?
感谢。
答案 0 :(得分:3)
以下是使用分组依据的工作解决方案:
inputdata = LOAD '/user/hdfs/tes/part-1' USING PigStorage(' ') AS (
id:chararray,
nama:chararray,
view:int
);
inputdata2 = LOAD '/user/hdfs/tes/part-2' USING PigStorage(' ') AS (
id:chararray,
nama:chararray,
view:int
);
A = UNION inputdata, inputdata2;
B = group A by (id, nama);
C = FOREACH B GENERATE group.id, group.nama, SUM(B.view) AS sum_views;
DUMP C;
还有其他可能性。 此链接可以帮助您:https://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
答案 1 :(得分:0)
inputdata = LOAD 'part-1,part-2' AS (id:chararray, name:chararray, view:int);
group_inputdata = GROUP inputdata
BY (id, name) ;
count_data = FOREACH group_inputdata
GENERATE FLATTEN(group),
SUM(inputdata.view);
dump count_data;