在PIG

时间:2017-10-04 09:23:20

标签: hadoop apache-pig

我有以下单个原始文件,需要将文件拆分为不同的关系。

如果行以0开头,则整行应转到关系'header'

如果行以1开头,则完整的行应该转到关系'ban'

如果行以2开头,则整行应转到关系'sub'

如果行以3开头,则整行应转到关系'item'

如果行以4开头,则整行应转到关系'tax'

0ALH   012012050104.00.00356.0012.06001

1980377362   HAW R 120010000IRN+000016323SABRINA D. ORTIZ                                            PO BOX 1764                                                                                                                                                                                             KAILUA KONA               HI967451764September 2009      03.4June 2008           06.0E   00

2980377362   8089363822    HAW  120010000SABRINA D. ORTIZ                                            75-1027 HENRY ST                                                                                                                                                                                        KAILUA KONA               HI967403154September 2009      03.4June 2008           06.0EN00

2980377362   8089375559    HAW  120010000SABRINA D. ORTIZ                                            75-1027 HENRY ST                                                                                                                                                                                        KAILUA KONA               HI967403154September 2009      03.4June 2008           06.0EN00

3980377362   8089363822             911FEEO      O           SNOTAX1001+000000066201205029-1-1 Service Fee                                                                                                                     0000004950533060000002163C

3980377362   8089363822    GSMUSELASCPKG  R      R           S          00000000020120502Custom Call Package                                                                                                                   000000495053163           

4980377362   8089363822    MSGFTM2AMM2ABUNR     L+000003000U    105      +04160000+000000125 0000000000000000495053186

4980377362   8089363822    MSGFTM2AMM2ABUNR     L+000003000U    131      +00084600+000000003 0000000000000000495053186

4980377362   8089363822    MSGFTM2AMM2ABUNR     L+000003000U    133      +04146600+000000124 0000000000000000495053186

请你帮我用猪脚本来做这件事吗?

1 个答案:

答案 0 :(得分:1)

将数据加载到单个字段中.Foreach行获取该行的第一个字符,并将其与您要查找的值进行比较,并使用split将其存储到不同的关系中。

A = LOAD '/path/file.txt' USING TextLoader() as (line:chararray);
SPLIT A INTO header IF SUBSTRING(A.line,0,1) == '0',
             ban IF    SUBSTRING(A.line,0,1) == '1',
             sub IF    SUBSTRING(A.line,0,1) == '2',
             item IF   SUBSTRING(A.line,0,1) == '3',
             tax IF    SUBSTRING(A.line,0,1) == '4';
DUMP header;
DUMP ban;
DUMP sub;
DUMP item;
DUMP tax;