我正在使用Python UDF,这会在reduce阶段导致错误。
java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.pig.data.DataByteArray
以下是UDF的代码:
import math
outputSchema("score:double")
def confidenceLowerBound(numerator, denominator, constant):
raw_score = numerator * 1.0 / denominator
normalized_interval = math.sqrt( raw_score * (1 - raw_score) / denominator )
wilson_score = raw_score - constant * normalized_interval
return wilson_score
这就是我称之为猪的udf。
register 'confidence_interval_compute.py' using jython as pyutils;
...
..
A = FOREACH A GENERATE $0, $1, $2, $3, $4, pyutils.confidenceLowerBound($3, $4, 4) AS score PARALLEL 20;
答案 0 :(得分:0)
正如@Ian Stevents在评论中指出的那样,这是因为你在装饰者中有一个拼写错误。
你应该使用
@outputSchema("score:double")