如何在PIG中键匹配时对特定列求和

时间:2017-07-06 18:36:15

标签: hadoop apache-pig

我有如下样本数据:

O/P:

(A,{(p,10)})

(B,{(q,40),(p,30)})

(C,{(t,60)},(q,20))

将其存储到PIG后,我需要输出如下:

Lo = load 'pivot.txt' using PigStorage (',') as (id:chararray, code:chararray, key:chararray, value:int);
Aa = group L by (code);
Bb = foreach Aa {AUX = foreach Lo generate $0,$2,$3;generate group, AUX;}`

dump Bb:
(A,{(1,p,10)})
(B,{(3,q,20),(3,p,30),(2,q,20)})
(C,{(3,t,60),(3,q,20)})

我们可以删除id,并且需要输出添加与特定代码的键匹配的所有值的总和。在上面的例子中,我们可以看到代码B-q,20是两次,因此被添加并成为q,40。

以下是我的代码,但无法获得准确的输出:

<?xml version="1.0" encoding="UTF-8"?>
<company>
<companyname>capgemini</companyname>
<address>mumbai</address>
    <department>
        <dname>software</dname>
        <deptphoneno>9876543210</deptphoneno>
        <deptfaxno>0447654321</deptfaxno>
        <deptemail>soft@capgemini.com</deptemail>
            <employee>
                <empid>101</empid>
                <ename>rajat</ename>
                <emailid>rajat@capgemini.com</emailid>
                <phoneno>9876543211</phoneno>
            </employee>
            <contractemployee>
                <name>jade</name>
                <phoneno>9882507167</phoneno>
            </contractemployee>
    </department>
</company>

我无法继续前进,非常感谢帮助。

谢谢, Rohith

1 个答案:

答案 0 :(得分:3)

猪脚本:

input_data = LOAD 'input.csv' USING PigStorage(',') AS (id:int,code:chararray,key:chararray,value:int);
req_stats = FOREACH(GROUP input_data BY (code,key)) GENERATE FLATTEN(group) AS (code,key), SUM(input_data.value) AS value;
req_stats_fmt = FOREACH(GROUP req_stats BY code) GENERATE group AS code, req_stats.(key,value);
DUMP req_stats_fmt;

输入:

1,A,p,10
2,B,q,20
3,B,p,30
3,B,q,20
3,C,t,60
3,C,q,20

输出:DUMP req_stats_fmt

(A,{(p,10)})
(B,{(q,40),(p,30)})
(C,{(t,60),(q,20)})