我是PIG拉丁语的新手,我正在尝试解决以下问题
查找每个areacode都有电话号码的员工数量。
EMPID ADD_ID ZIP SAL PHONE DAT
Abcd411 PbcDr60264 953492 46404 111-432-4193 20150113
Abcd874 PbcDr39353 186307 29873 100-432-9164 20150728
Abcd197 PbcDr46725 306185 31908 113-432-4191 20150410
Abcd160 PbcDr77738 330533 61313 105-432-2468 20151007
Abcd327 PbcDr10034 951703 39301 109-432-9235 20150805
Abcd172 PbcDr21679 683299 71686 105-432-5616 20150908
Abcd227 PbcDr57694 876619 46743 109-432-9181 20151101
Abcd900 PbcDr80166 970136 34242 105-432-7415 20150820
Abcd318 PbcDr34711 234066 10989 101-432-9667 20150906
Abcd702 PbcDr86734 997954 97688 105-432-6592 20151026
以下是我试图解决它的方式。
empdata = LOAD '/home/cloudera/empData.txt' as (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, dateOfJoin:long);
grpdata = GROUP empdata by SUBSTRING(phone, 0, INDEXOF(phone, '-' , 0));
dataCnt = foreach grpdata generate count(grpdata);
但我没有收到错误声明: - Invalid scalar projection: grpdata : A column needs to be projected from a relation for it to be used as a scalar
在同一数据集的另一个问题陈述中
Find number of employees having date of joining between 2015-01-01 to 2015-05-28.
我尝试了以下解决方案,但这次我没有得到任何结果。
empdata = LOAD '/home/cloudera/empData.txt' as (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, doj:chararray);
filtDate = filter empdata by ToDate(doj, 'yyyyMMdd') >= ToDate('20150101', 'yyyymmdd') AND ToDate(doj, 'yyyyMMdd') <= ToDate('20150528', 'yyyymmdd');
请帮助解释。
答案 0 :(得分:1)
试试这个
empdata = LOAD '/home/cloudera/empData.txt' as using PigStorage(' ') (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, dateOfJoin:long);
grpdata = GROUP empdata by SUBSTRING(phone, 0, INDEXOF(phone, '-' , 0));
dataCnt = foreach grpdata generate $0, COUNT(empdata);
答案 1 :(得分:0)
你应该算上empdata
dataCnt = foreach grpdata generate COUNT(empdata);