我正在尝试对我的数据执行求和操作但是它不接受显式类型转换我尝试用double替换(int)而执行求和。
代码
drivers = LOAD '/sachin/drivers.csv' USING PigStorage(',');
time = LOAD '/sachin/timesheet.csv' USING PigStorage(',');
drivdata = FILTER drivers BY $0>1;
timedata = filter time by $0>0;
drivgrp = group timedata by $0;
drivinfo = foreach drivgrp generate group as id , SUM(timedata.$2) as totalhr , SUM(timedata.$3) as totmillogged;
drivfinal = foreach drivdata generate $0 as id , $1 as name;
result = join drivfinal by id , drivinfo by id;
finalres = foreach result generate $0 as id, $1 as name, $3 as hrslogged, $4 as mileslogged;
summile = foreach finalres generate (int)SUM(mileslogged);
DUMP summile;
错误消息
grunt> exec /home/sachin/sec.pig
2017-12-13 21:57:58,812 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 1 time(s).
2017-12-13 21:57:58,854 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:58,996 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,036 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,080 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,121 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,192 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,246 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: <line 10, column 41> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Details at logfile: /home/sachin/pig_1513175202309.log
grunt>
我实际上是尝试对前5个列表中的每个驱动程序执行操作,并查找记录的里程数以及驱动程序记录的总里程数与记录的总里程数之间的比例,并将结果存储在hdfs中。
数据集链接:https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/hdp/how-to-process-data-with-apache-pig/assets/driver_data.zip
任何人都可以帮我解决这个问题或帮助我理解这里出了什么问题吗?
答案 0 :(得分:0)
你必须施放里程数,然后调用SUM函数
finalres = foreach result generate $0 as id, $1 as name, $3 as hrslogged, (int)$4 as mileslogged;
summile = foreach finalres generate SUM(mileslogged);
另外我注意到你没有在load语句中指定数据类型。默认数据类型是bytearray,如果你没有在后续步骤中显式地转换字段,我怀疑你会得到正确的结果。
答案 1 :(得分:0)
从
http://pig.apache.org/docs/r0.17.0/func.html#sum
SUM定义为
计算单列包中数值的总和。 SUM要求全局和的前一个GROUP ALL语句和组和的GROUP BY语句。
您的代码传递的是double,而SUM需要包含双精度的BAG。无需进行类型转换,但需要在调用SUM函数之前进行分组。
allres = group finalres ALL;
summile = foreach allres generate SUM(finalres.mileslogged);
DUMP summile;