Apache Pig:FLATTEN(udf生成)包后跟连接会导致ClassCastException

时间:2016-04-26 13:17:14

标签: python apache-pig

我试图将一包值压平到Pig中的多个记录,然后将这些记录与其他具有这些展平值的记录结合起来。 但是,我的努力导致了ClassCastException(Pig DataByteArray到Java Integer)。这是一个重现这个问题的MWE。

输入文件

档案a:a.txt

123, fruit1
234, fruit2
345, fruit3
783, fruit4
928, fruit5
317, fruit6
937, fruit7

文件b:b.txt

global23; [num1#123,num2#234]
global45; [num1#783,num2#928,num3#317]

Python UDF:udf.py

@outputSchema("values:bag{t:tuple(value:int)}")
def bag_of_tuples(map_dict):
    return map_dict.values()

猪脚本:

REGISTER 'udf.py' using jython as udf;

a = LOAD 'a.txt' using PigStorage(',') AS (num: int, fruit: chararray);
b = LOAD 'b.txt' using PigStorage(';') AS (global: chararray, mymap: map[]);
c = FOREACH b GENERATE global AS (global: chararray), FLATTEN(udf.bag_of_tuples(mymap)) AS (othernum: int);

d = JOIN a BY num, c BY othernum;
DUMP d;

预期结果

加入记录,例如:

num, fruit, global, othernum
(123, fruit1, global23, 123)
(234, fruit2, global23, 234)
...

有什么想法吗?这might be a bug

0 个答案:

没有答案