我是Hadoop和pig的新手。根据我能够深入到下面的脚本的问题,但我怎样才能将人的工资与他的部门的平均工资进行比较。以下是为获得每个部门的平均工资而编写的脚本
A = LOAD 'Assignment_1_Input.log' USING PigStorage('\t') as (id:int,name:chararray,age:int,salary:int,deptid:int);
B = GROUP A by deptid;
STORE B INTO 'Assign1GrpByNew';
C = FOREACH B GENERATE group as grpId,AVG(A.salary) as grpAvgSal;
DUMP C;
输入文件:
15878 mohan 24 8000 1
19173 ramya 27 10000 1
9527 krishna 35 40000 2
9528 raj 36 60000 2
16884 ravi 50 70000 2
预期产出
ramya 1
raj 2
ravi 2
帮帮我,谢谢
答案 0 :(得分:0)
JOIN
A和C由deptid,grpId和FILTER
在哪里工资> grpAvgSal
A = LOAD 'Assignment_1_Input.log' USING PigStorage('\t') as (id:int,name:chararray,age:int,salary:int,deptid:int);
B = GROUP A by deptid;
STORE B INTO 'Assign1GrpByNew';
C = FOREACH B GENERATE group as grpId,AVG(A.salary) as grpAvgSal;
D = JOIN A BY deptid,C BY grpId;
E = FILTER D BY (A::salary > C::grpAvgSal);
DUMP E;
答案 1 :(得分:0)
GROUP BY dept_id并计算每个员工记录的平均薪水,并选择那些薪水大于平均值的员工。
代码段
inp_data = LOAD 'Assignment_1_Input.log' USING PigStorage('\t') as (id:int,name:chararray,age:int,salary:int,deptid:int);
inp_data_fmt = FOREACH(GROUP inp_data BY deptid) GENERATE FLATTEN(inp_data), AVG(inp_data.salary) AS avg_salary;
req_data = FILTER inp_data_fmt BY salary > avg_salary;
DUMP req_data;