我正在尝试在猪中使用AVG,MIN,MAX。执行时,MIN和MAX功能都被卡住,AVG功能抛出错误。但COUNT函数运行正常。
org.apache.pig.backend.executionengine.ExecException:ERROR 0:标量在输出中有多行。 1 :( 2级教师,{(65587.90)}),第2名:( 4级教师,{(56567.24)})
我的代码:
register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary:float, REPLACE(TA,',','') as TA:float, type, org, year;
C = filter B by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E genetate group,AVG(D.salary);
minim = foreach E genetate group,MIN(D.salary);
maxim = foreach E genetate group,MAX(D.salary);
样本数据
(ABBOTT,DEEDEE W,GRADES 9-12 TEACHER,52,122.10,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABBOTT,RYAN V,GRADE 4 TEACHER,56,567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABBOUD,CLAUDIA MORA,GRADES K-5 TEACHER,63,957.50,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDUL-JABBAR,KHADEEJA ,GRADES 9-12 TEACHER,16,791.73,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDUL-RAZACQ,SALAHUD-DIN ,INSTRUCTIONAL SPECIALIST P-8,45,832.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,DIANA ,SPECIAL ED PARAPRO/AIDE,10,934.94,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,NADIYAH W,GRADES 6-8 TEACHER,75,109.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,RHONDALYN Y,SPECIAL ED PARAPRO/AIDE,28,649.34,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(OSBORNE,CHRISTINE L,INSTRUCTIONAL SUPERVISOR,78,875.59,3,265.71,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
(OSBORNE,DORIS A,OCCUPATIONAL THERAPIST ,65,421.79,1,156.05,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
第7行中GROUP操作后的示例数据。
(GRADE 2 TEACHER,{(OSBORNE,VIRGINIA E,GRADE 2 TEACHER,65587.90,0,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)})
(GRADE 4 TEACHER,{(ABBOTT,RYAN V,GRADE 4 TEACHER,56567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)})
(MAINTENANCE PERSONNEL,{(BROOKS,RICHARD M,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SUMNER,ROBERT O,MAINTENANCE PERSONNEL,72655.53,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MCCULLOUGH,ALVIN J,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(DALTON,JAMES E,MAINTENANCE PERSONNEL,72655.52,2124.60,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SMITH,KEVIN W,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MANGHAM,LARRY G,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010)})
这是猪的一个错误吗?请帮帮我。
答案 0 :(得分:1)
这是更新的Pig Script。
register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary, REPLACE(TA,',','') as TA, type, org, year;
B1 = foreach B generate name, job, (double)salary, (double)TA, type, org, year;
C = filter B1 by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E generate group,AVG(D.salary);
minim = foreach E generate group,MIN(D.salary);
maxim = foreach E generate group,MAX(D.salary);
问题是,您需要为salary
和TA
属性提供明确的转换。