位置变量猪过滤器

时间:2016-09-07 18:37:56

标签: hadoop apache-pig

我正在尝试过滤位置变量。

X = FILTER C BY($14 matches '.*USD.*');
STORE X into '$output' using PigStorage(',');

以上陈述不起作用,但如果我尝试输出$ 14

E = FOREACH C GENERATE FLATTEN($14);
STORE C into '$output' using PigStorage(',');

工作正常

示例数据:

304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,USD120
304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,USD0
304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,GBP0

示例输出

304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,USD0
304a285281be,1383027928890968764,receiver,10C,655362,C2,USD811289,1,0,0,ebay_checkout,cc,cc,USD2659,GBP0

2 个答案:

答案 0 :(得分:0)

在'BY'和'('

之间添加一个空格
    X = FILTER C BY (FLATTEN($14) matches '.*USD.*');
    STORE X into '$output' using PigStorage(',');

答案 1 :(得分:0)

这对我来说很有意义:

A = LOAD 'StackFile.txt'  using PigStorage(',');
B = FILTER A BY ($14 matches '.*USD.*');
DUMP B;
  

块引用

     

304a285281be,1383027928890968764,接收器,如图10C所示,655362,C2,USD811289,1,0,0,ebay_checkout,CC,CC,USD2659,USD120   304a285281be,1383027928890968764,接收器,如图10C所示,655362,C2,USD811289,1,0,0,ebay_checkout,CC,CC,USD2659,USD0