Apache PIG - 根据另一个值创建一个新列

时间:2016-08-11 16:36:55

标签: foreach apache-pig valueconverter

我的一个专栏(名为Product)被定义为Chararray,它有三个值:OT,AT和HP。我想创建一个新列并将其转换为整数:

  1. OT = 1
  2. AT = 2
  3. HP = 3
  4. 为此我创建了一个foreach语句:

    REGISTER '/usr/lib/pig/piggybank.jar';
    
    File = load '/user/cloudera/file.csv'  
        USING org.apache.pig.piggybank.storage.CSVExcelStorage(',')
          as (ID:Long, 
              Chain:Int,
              Dept:Int,
              Product_Measure:Chararray,
              Price:Double);
    
    
    Values = FOREACH File Generate
                                ID,
                                                Chain,
                                                Dept,
                                                ((Chararray)Product_Measure=='OT'?'1':(Chararray)Product_Measure=='AT'?'2':(Chararray)Product_Measure=='HP'?'3':'0') as Product_Measure,
                                                (Price<0.1?0:Price) as Price;
    
    Filter_Values = FILTER Values BY  Price > 0;
    
    DUMP  Filter_Values;
    

    如果删除thrid行它工作正常,所以我认为当我尝试在int中转换chararray时问题出现了。

    任何人都可以帮助我吗?

    谢谢!

1 个答案:

答案 0 :(得分:0)

Values = FOREACH Source Generate
                       ID,
                       Date,
                      ((Chararray)Product == 'OT' ? (int)1 :  (Chararray)Product_Measure == 'AT' ? (int)2 : (Chararray)Product_Measure == 'HP' ? (int)3 : 0) as Product_Value,
                     (Quantity<0?0:Quantity) as Quantity,
                     (Price<0.1?0:Price) as Price;

或者如果你想要NULL那么

 Values = FOREACH Source Generate
                           ID,
                           Date,
                          ((Chararray)Product == 'OT' ? '1' :  (Chararray)Product_Measure == 'AT' ? '2' : (Chararray)Product_Measure == 'HP' ? '3' : 'NULL') as Product_Value,
                         (Quantity<0?0:Quantity) as Quantity,
                         (Price<0.1?0:Price) as Price;

你需要在你的猪脚本中进行两次修改。 第1代=刚刚放== 如果您想要null值,请将其转换为chararray,否则所有替换值均为int