猪脚本功能问题

时间:2011-06-29 06:44:58

标签: function apache-pig

从以下Pig代码中可以看出,我正在为Attr1和Attr2重复一组语句。有没有办法在函数中提取它?代码示例确实会有所帮助。

Attr1ValidRecs = FILTER BaseRecs BY Attr1 IS NOT NULL;
Attr1ValidRecs_all = GROUP Attr1ValidRecs ALL;
Attr1Count = FOREACH Attr1ValidRecs_all GENERATE COUNT(Attr1ValidRecs);
Attr1CountStr = FOREACH Attr1Count GENERATE CONCAT('Recs with Attr1 not null : ',(chararray)$0);

Attr1BaseCross = CROSS BaseRecsCount,Attr1Count;
Attr1BaseRatio = FOREACH Attr1BaseCross GENERATE CONCAT('Ratio of Not Null Attr1 to Total Base Recs: ',(chararray)((double)$1/(double)$0));

Attr2ValidRecs = FILTER BaseRecs BY Attr2 IS NOT NULL;
Attr2ValidRecs_all = GROUP Attr2ValidRecs ALL;
Attr2Count = FOREACH Attr2ValidRecs_all GENERATE COUNT(Attr2ValidRecs);
Attr2CountStr = FOREACH Attr2Count GENERATE CONCAT('Recs with Attr2 not null : ',(chararray)$0);

Attr2BaseCross = CROSS BaseRecsCount,Attr2Count;
Attr2BaseRatio = FOREACH Attr2BaseCross GENERATE CONCAT('Ratio of Not Null Attr2 to Total Base Recs:
',(chararray)((double)$1/(double)$0));

1 个答案:

答案 0 :(得分:0)

不幸的是,您无法将多行替换为一批Pig操作。这是我希望有时可以做的事情,所以我很同情。

我过去所做的事情,当我在同一个脚本中反复重复时,是用Python循环生成Pig Latin代码(或者显然是其他任何东西),用for循环代替某些关键字。但是,这仍然很脏。