Apache Pig读取数据文件中的名称值对

时间:2016-03-15 22:07:54

标签: apache-pig

我有一个示例猪脚本,其中的数据将读取csv文件并将其转储到屏幕上;但是,我的数据有名称值对。如何读取一行名称值对并使用字段名称和值的值拆分对?

数据:

1,Smith,Bob,Business Development
2,Doe,John,Developer
3,Jane,Sally,Tester

脚本:

data = LOAD 'example-data.txt' USING PigStorage(',') 
           AS (id:chararray, last_name:chararray, 
           first_name:chararray, role:chararray);
DESCRIBE data;
DUMP data;

输出:

data: {id: chararray,last_name: chararray,first_name: chararray,role: chararray}
(1,Smith,Bob,Business Development)
(2,Doe,John,Developer)
(3,Jane,Sally,Tester)

但是,给定以下输入(作为名称值对);我如何处理数据以获得相同的“数据对象”?

id=1,last_name=Smith,first_name=Bob,role=Business Development
id=2,last_name=Doe,first_name=John,role=Developer
id=3,last_name=Jane,first_name=Sally,role=Tester

1 个答案:

答案 0 :(得分:0)

请参阅STRSPLIT

A = LOAD 'example-data.txt' USING PigStorage(',') AS (f1:chararray,f2:chararray,f3:chararray, f4:chararray);
B = FOREACH A GENERATE
               FLATTEN(STRSPLIT(f1,'=',2)) as (n1:chararray,v1:chararray),
               FLATTEN(STRSPLIT(f2,'=',2)) as (n2:chararray,v2:chararray),
               FLATTEN(STRSPLIT(f3,'=',2)) as (n3:chararray,v3:chararray),
               FLATTEN(STRSPLIT(f4,'=',2)) as (n4:chararray,v4:chararray);
C = FOREACH B GENERATE v1,v2,v3,v4;
DUMP C;