猪编程逻辑

时间:2014-11-09 11:34:46

标签: apache-pig

10278929012|HDFC1001|SBI|2014-08-03|8000|S
10278929012|HDFC1001|HDFC|2014-08-17|500|S

我需要找出atm_id是否属于同一个银行,然后我需要一个指标来制作

我需要这样的输出

10278929012|HDFC1001|SBI|diff_bank
10278929012|HDFC1001|HDFC|same_bank


atm_trans = LOAD '/user/cloudera/inputfiles/atm_trans.txt' USING PigStorage('|') as(accnt_no:long,atm_id:chararray,bank_name :chararray,date:chararray,amt:chararray,status:chararray);

atm_trans_each = foreach atm_trans generate accnt_no,atm_id,bank_name,(bank_name matches atm_id ?'same_bank' : 'diff_bank') as ind;

dump atm_trans_each;

但我收到语法错误。有人可以纠正它给我正确的声明来获得输出;

1 个答案:

答案 0 :(得分:0)

你能试试吗?

<强> input.txt中

10278929012|HDFC1001|SBI|2014-08-03|8000|S
10278929012|HDFC1001|HDFC|2014-08-17|500|S

<强> PigScript:

atm_trans = LOAD 'input.txt' USING PigStorage('|') as(accnt_no:long,atm_id:chararray,bank_name:chararray,date:chararray,amt:chararray,status:chararray);
atm_trans_each = foreach atm_trans generate accnt_no,atm_id,bank_name,((STARTSWITH(atm_id,bank_name)== true)?'same_bank':'diff_bank') as ind;
STORE atm_trans_each INTO 'output' USING PigStorage('|');

更新:0.8版本

atm_trans_each = foreach atm_trans generate accnt_no,atm_id,bank_name,((REGEX_EXTRACT(atm_id,'([A-Za-z]+)',1) == bank_name)?'same_bank':'diff_bank');

<强>输出:

10278929012|HDFC1001|SBI|diff_bank
10278929012|HDFC1001|HDFC|same_bank