我正在使用三元运算符有条件地在SUM()
运算中包含值。我就是这样做的。
GROUPED = GROUP ALL_MERGED BY (fld1, fld2, fld3);
REPORT_DATA = FOREACH GROUPED
{ GENERATE group,
SUM(GROUPED.fld4 == 'S' ? GROUPED.fld5 : 0) AS sum1,
SUM(GROUPED.fld4 == 'S' ? GROUPED.fld5 : (GROUPED.fld5 * -1)) AS sum2;
}
ALL_MERGED
的架构是
{ALL_MERGED: {fld1:chararray, fld2:chararray, fld3:chararray, fld4:chararray: fld5:int}}
当我执行此操作时,它会出现以下错误:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: SUM in {group: (fld1:chararray, fld2:chararray, fld3:chararray), ALL_MERGED: {fld1:chararray, fld2:chararray, fld3:chararray, fld4:chararray: fld5:int}}
我在这里做错了什么?
答案 0 :(得分:2)
SUM
是一个UDF,它将一个包作为输入。你在做什么有很多问题,我怀疑它会帮助你回顾关于猪的好参考。我推荐Programming Pig,可在线免费获取。首先,GROUPED
有两个字段:一个名为group
的元组和一个名为ALL_MERGED
的包,这是错误消息试图告诉您的内容。 (我说“尝试”因为Pig错误信息通常非常神秘。)
此外,您不能像您希望的那样将表达式传递给UDF。相反,您必须GENERATE
这些字段,然后再传递它们。试试这个:
ALL_MERGED_2 =
FOREACH ALL_MERGED
GENERATE
fld1 .. fld5,
((fld4 == 'S') ? fld5 : 0) AS sum_me1,
((fld4 == 'S') ? fld5 : fld5*-1) AS sum_me2;
GROUPED = GROUP ALL_MERGED_2 BY (fld1, fld2, fld3);
DATA =
FOREACH GROUPED
GENERATE
group,
SUM(ALL_MERGED_2.sum_me1) AS sum1,
SUM(ALL_MERGED_2.sum_me2) AS sum2;