如何在猪的多个加载文件中求和(查看)

时间:2015-10-26 04:10:02

标签: hadoop apache-pig

我有一些问题。我想要在两个加载文件中的总和视图。

示例数据:

load data - 1
id name view
1  A    4
2  B    5
3  C    6

load data - 2
id name view
1  A    4
2  B    5
4  D    6

我想要输出:

output
id name view
1  A    8
2  B    10
3  C    6
4  D    6

我的猪代码:

inputdata = LOAD '/user/hdfs/tes/part-1' AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

x = UNION inputdata, inputdata2;

dump x;

如何在示例数据中对视图2加载文件求和。?

感谢。

2 个答案:

答案 0 :(得分:3)

以下是使用分组依据的工作解决方案:

inputdata = LOAD '/user/hdfs/tes/part-1' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);


inputdata2 = LOAD '/user/hdfs/tes/part-2' USING PigStorage(' ') AS (
    id:chararray, 
    nama:chararray, 
    view:int
);

A = UNION inputdata, inputdata2;
B = group A by (id, nama);
C = FOREACH B GENERATE group.id, group.nama, SUM(B.view) AS sum_views;
DUMP C;

还有其他可能性。 此链接可以帮助您:https://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/

答案 1 :(得分:0)

inputdata = LOAD 'part-1,part-2' AS (id:chararray, name:chararray, view:int);

group_inputdata = GROUP inputdata
                        BY (id, name) ;

count_data = FOREACH group_inputdata
             GENERATE FLATTEN(group),
                      SUM(inputdata.view);

dump count_data;