在实现How to optimize a group by statement in PIG latin?提供的解决方案时,我发现所有带有一个空列的行都被删除,这是Pig中的预期行为。我想知道下面的代码是否有效?
A = B join by ( Bcol1 is null?'UNK',Bcol2 is null?'UNK',Bcol2 is null?999),
C join by ( Ccol1 is null?'UNK',Ccol2 is null?'UNK',Ccol2 is null?999)
我收到一些解析错误。
答案 0 :(得分:3)
PIG是一种数据流脚本语言,添加额外的FOREACH GENERATE来修复null不会导致额外的地图缩减作业。
B = foreach B generate ....., (Bcol1 is null) ? 'UNK' : Bcol1 as Bcol1, (Bcol2 is null) ? 'UNK' : Bcol2 as Bcol2, (Bcol3 is null) ? 999 : Bcol3;
C = foreach C generate ....., (Ccol1 is null) ? 'UNK' : Ccol1 as Ccol1, (Ccol2 is null) ? 'UNK' : Ccol2 as Ccol2, (Ccol3 is null) ? 999 : Ccol3;
A = join B by (Bcol1, Bcol2, Bcol3), C by (Ccol1, Ccol2, Ccol3);