我在foreach
:
grunt> describe METRICS_SOURCE_WITH_CNT
METRICS_SOURCE_WITH_CNT:
{group: (hostname: chararray,site_guid: chararray,timestamp: long),
JOIN_FIELDS_ONLY: {(timestamp: long, unique_pageviews: long)},cnt: long
请注意,cnt
是元组的总和。
METRICS_SOURCE_TOP3 = foreach METRICS_SOURCE_WITH_CNT {
SORTED = ORDER JOIN_FIELDS_ONLY by unique_pageviews DESC;
TOPK = LIMIT SORTED 10;
REVSORTED = ORDER JOIN_FIELDS_ONLY by unique_pageviews ASC;
BOTTOMK = LIMIT REVSORTED cnt;
generate TOPK, BOTTOMK;
}
但似乎当我应用第二个LIMIT
时,Pig认为cnt
字段在REVSORTED
内,但它实际上是一个“父”字段。
Invalid field projection. Projected field [cnt] does not exist in schema: timestamp:long,....
我尝试按编号$x
引用字段,但它不起作用。 Pig总是认为引用的字段在LIMIT
'd
答案 0 :(得分:1)
您需要使用Pig dereference operator,它允许您使用.
引用父级。以你的例子:
METRICS_SOURCE_TOP3 = foreach METRICS_SOURCE_WITH_CNT {
SORTED = ORDER JOIN_FIELDS_ONLY by unique_pageviews DESC;
TOPK = LIMIT SORTED 10;
REVSORTED = ORDER JOIN_FIELDS_ONLY by unique_pageviews ASC;
BOTTOMK = LIMIT REVSORTED METRICS_SOURCE_WITH_CNT.cnt;
generate TOPK, BOTTOMK;
}
还有一点值得注意的是,在0.10 Pig之前,在LIMIT
语句中不支持标量,所以这种语句会失败。