我正在尝试生成聚合输出。 最好的方法是什么:
A_GROUP = GROUP A BY ID PARALLEL;
A_COUNT = FOREACH A_GROUP {
A_TMP1 = FILTER A BY Col1 == 'Other';
A_TMP2 = FILTER A BY Col2 == 'Other';
cnt_fltrCol1 = COUNT(A_TMP1);
cnt_fltrCol2 = COUNT(A_TMP2);
GENERATE group,cnt_fltrCol1,cnt_fltrCol2;
}
或者:
A_FOREACH = FOREACH A GENERATE *,
((Col1 == 'Other') ? 1 : 0) as fltrCol1,
((Col2 == 'Other') ? 1 : 0) as fltrCol2;
A_GRP = GROUP A_FOREACH BY ID;
A_COUNT = FOREACH A_GRP {
cnt_fltrCol1 = SUM(fltrCol1);
cnt_fltrCol2 = SUM(fltrCol2);
GENERATE
group,cnt_fltrCol1,cnt_fltrCol2;
}
目前,我有内存问题(我的真实脚本要大得多) 提前感谢您的回答。