如何从MapReduce中的pig Latin获取此输出

时间:2016-06-10 12:03:26

标签: mapreduce apache-pig

我想从Pig Latin / Hadoop

获得以下输出
((39,50,60,42,15,Bachelor,Male),5)
((40,35,HS-grad,Male),2)
((39,45,15,30,12,7,HS-grad,Female),6)

来自以下数据样本 data sample for adult data

我写了以下Pig Latin脚本:

sensitive = LOAD '/mdsba/sample2.csv' using PigStorage(',') as (AGE,EDU,SEX,SALARY);
BV= group  sensitive by (EDU,SEX) ; 
BVA= foreach BV generate group as EDU, COUNT (sensitive) as dd:long;
Dump BVA ;

不幸的是,结果就像这样

((Bachelor,Male),5)
((HS-grad,Male),2)

1 个答案:

答案 0 :(得分:1)

也试图投射AGE数据。 像这样:

BVA= foreach BV generate 
    sensitive.AGE as AGE,
    FLATTEN(group) as (EDU,SEX), 
    COUNT(sensitive) as dd:long;

另一个建议是在加载数据时指定数据类型。

sensitive = LOAD '/mdsba/sample2.csv' using PigStorage(',') as (AGE:int,EDU:chararray,SEX:chararray,SALARY:chararray);