如何优化压平操作?

时间:2014-10-22 06:47:24

标签: join hadoop apache-pig flatten

我有一个猪脚本,可以计算17种不同的输出并在最后合并它们。 为了合并数据,我使用了cogroup操作。

由于cogroup输出包含每个输入的连接标识符,我必须删除一些不必要的列。 因此,有一个扁平的操作员。

但是,脚本挂起率为95%并且没有完成。 当我丢弃最后一部分(CG_FLAT)时,它完全正常。

所以我需要以某种方式优化展平部分。 有什么想法吗?

    REST OF THE SCRIPT
    ...
    ...
    CG = cogroup countSmart by ($0),
            countModem by ($0),
            countTablet by ($0),
            countOther by ($0),
            count2G by ($0),
            count3G by ($0),
            countUMTS900 by ($0),
            countGPRS by ($0),
            countEDGE by ($0),
            countR99 by ($0),
            countHSDPA_432 by ($0),
            countHSDPA_288 by ($0),
            countHSDPA_216 by ($0),
            countHSDPA_144 by ($0),
            countHSDPA_72 by ($0),
            countHSDPA_36 by ($0),
            countHSDPA_Unknown by ($0);

CG_FLAT = foreach CG generate
        flatten($0),
         FLATTEN((IsEmpty($1.$2) ? null :  $1.$2)), FLATTEN((IsEmpty($1.$3) ? null :  $1.$3)),
         FLATTEN((IsEmpty($2.$2) ? null :  $2.$2)), FLATTEN((IsEmpty($2.$3) ? null :  $2.$3)),
         FLATTEN((IsEmpty($3.$2) ? null :  $3.$2)), FLATTEN((IsEmpty($3.$3) ? null :  $3.$3)),
         FLATTEN((IsEmpty($4.$2) ? null :  $4.$2)), FLATTEN((IsEmpty($4.$3) ? null :  $4.$3)),
         FLATTEN((IsEmpty($1.$2) ? null :  $1.$2)), FLATTEN((IsEmpty($1.$3) ? null :  $1.$3)),FLATTEN((IsEmpty($1.$4) ? null : $1.$4)),
         FLATTEN((IsEmpty($2.$2) ? null :  $2.$2)), FLATTEN((IsEmpty($2.$3) ? null :  $2.$3)),FLATTEN((IsEmpty($2.$4) ? null : $2.$4)),
         FLATTEN((IsEmpty($3.$2) ? null :  $3.$2)), FLATTEN((IsEmpty($3.$3) ? null :  $3.$3)),FLATTEN((IsEmpty($3.$4) ? null : $3.$4)),
         FLATTEN((IsEmpty($8.$2) ? null :  $8.$2)), FLATTEN((IsEmpty($8.$3) ? null :  $8.$3)),
         FLATTEN((IsEmpty($9.$2) ? null :  $9.$2)), FLATTEN((IsEmpty($9.$3) ? null :  $9.$3)),
        FLATTEN((IsEmpty($10.$2) ? null : $10.$2)),FLATTEN((IsEmpty($10.$3) ? null : $10.$3)),
        FLATTEN((IsEmpty($11.$2) ? null : $11.$2)),FLATTEN((IsEmpty($11.$3) ? null : $11.$3)),
        FLATTEN((IsEmpty($12.$2) ? null : $12.$2)),FLATTEN((IsEmpty($12.$3) ? null : $12.$3)),
        FLATTEN((IsEmpty($13.$2) ? null : $13.$2)),FLATTEN((IsEmpty($13.$3) ? null : $13.$3)),
        FLATTEN((IsEmpty($14.$2) ? null : $14.$2)),FLATTEN((IsEmpty($14.$3) ? null : $14.$3)),
        FLATTEN((IsEmpty($15.$2) ? null : $15.$2)),FLATTEN((IsEmpty($15.$3) ? null : $15.$3)),
        FLATTEN((IsEmpty($16.$2) ? null : $16.$2)),FLATTEN((IsEmpty($16.$3) ? null : $16.$3)),
        FLATTEN((IsEmpty($17.$2) ? null : $17.$2)),FLATTEN((IsEmpty($17.$3) ? null : $17.$3));

0 个答案:

没有答案