将字符串修改为CommaSeparated Line - Apache PIG

时间:2016-09-25 15:12:14

标签: string group-by apache-pig

执行我的PIG脚本后:

FILE = LOAD 'PATH_FILE'  
    USING PigStorage(',') as 
      (ID:Long, 
      MUNICIPALITY:String,
          CITY:Int,
          COUNTRY:Int,
          COMPANY:Long,
          BRAND:Long,
          DATE:Chararray,
          STOCK_NAME:Chararray,
          STOCK_SIZE:Double,
          STOCK_AMOUNT:Double);

DATA = GROUP FILE BY (ID,MUNICIPALITY);

GRP_DATA = FOREACH DATA GENERATE group as STOCK_ID, FILE.COMPANY as COMPANY, FILE.BRAND as BRAND,FILE.DATE as DATE, FILE.STOCK_NAME AS STOCK_NAME, SUM(FILE.STOCK_AMOUNT) as STOCK_AMOUNT;

RANKING = rank GRP_DATA by STOCK_NAME,COMPANY,BRAND;

STORE RANKING INTO 'PATH_DESTINATION USING PigStorage(',');

我得到了这个输出:

1,(7287026502032012,18),{(706)},{(101200010)},{(17286)},{(oz)},2.5

我如何使用PIG可以获得这一行:

 1,7287026502032012,18,706,101200010,17286,oz,2.5

可以退货吗?

非常感谢!!

1 个答案:

答案 0 :(得分:0)

您可以使用正则表达式删除所有(){}

[(){}]+

请参阅regex demo

在PIG中:

A = LOAD 'input.txt' as line;  
B = FOREACH A GENERATE REPLACE(line,'[(){}]+',''); 
dump B;