PigScript - 多次操作后加入(GROUP和FLATTEN)

时间:2014-03-27 19:24:31

标签: join group-by apache-pig flatten

我是PIG编程的新手,我与多个领域有关系(我在下面的例子中简化了架构)。我多次进行一些计算,最后我试图加入结果。但是我没有得到任何结果,如果我运行描述,那么模式似乎是正确的。此外,在查看语法检查时,唯一引起我注意的是此警告:WARN org.apache.pig.PigServer - 遇到警告IMPLICIT_CAST_TO_CHARARRAY。

输入

(123,1,-52.39,-1,2006-05-15)
(123,1,-52.39,-1,2007-04-04)
(123,2,-55.15,-1,2006-05-15)
(123,3,-49.64,-1,2006-05-15)
(123,4,52.39,1,2006-05-15)
(123,4,-52.39,-1,2007-04-04)
(123,4,52.39,1,2007-04-04)
(123,4,-52.39,-1,2007-04-09)
(123,5,86.86,1,2007-04-04)
(123,5,-86.86,-1,2007-04-09)

期望的输出:

(123,1,-104.78,-2,2007-04-04)
(123,2,-55.15,-1,2006-05-15)
(123,3,-49.64,-1,2006-05-15)
(123,4,0,0,2007-04-09)
(123,5,0,0,2007-04-09)
c1 = load 'file.csv' using PigStorage(',') as (ID, LN, PAY_AMT:double,UNIT_QTY:int, PD_DT);
c2 = FOREACH c1 GENERATE ID, LN, PAY_AMT, UNIT_QTY;
c3 = group c2 by (ID, LN);
c3agg = FOREACH c3 GENERATE FLATTEN(group) as (ID,LN),
      SUM(c2.PAY_AMT) as PdAmt, SUM(c2.UNIT_QTY) as Unit_qty;
  

描述c3agg;

     

c3agg:{ID:bytearray,LN:bytearray,PdAmt:double,Unit_qty:long}

所以现在我试图获得MAX(PD_DT),因为使用实际的MAX运算符并不起作用(或者至少我无法在不使用下面的代码的情况下解决它)。

c4 = foreach c1 generate ID, LN, PD_DT;
c5 = group c4 by (ID, LN);
c3dt = FOREACH c5 {                 -- get MAX(PD_DT), 
    c5ord = ORDER c4 by PD_DT DESC;
    c5lmt = LIMIT c5ord 1;
    GENERATE FLATTEN(c5lmt);};
  

描述c3dt;

     

c3dt:{c5lmt :: ID:bytearray,c5lmt :: LN:   字节组,c5lmt :: PD_DT:字节组}

现在尝试连接,它不会返回任何内容:

cj = JOIN c3agg BY (ID, LN), c3dt BY (ID, LN);
dump cj;

我尝试使用字段位置,但结果相同。     cj =加入c3agg($ 0,$ 1),c3dt BY($ 0,$ 1);

describe cj;
cj: {c3agg::ID: bytearray,c3agg::LN: bytearray,c3agg::PdAmt: double,c3agg::Unit_qty: long,c3dt::c5lmt::ID: bytearray,c3dt::c5lmt::LN: bytearray,c3dt::c5lmt::PD_DT: bytearray}

另外,我尝试定义字段类型,例如ID:chararray和LN:int,但仍然没有结果。我真的无法弄清楚我做错了什么?

谢谢!

0 个答案:

没有答案