猪的描述多次ORDER

时间:2015-09-18 03:08:42

标签: apache-pig

我想获得cid的最新日期,也是同一日期的最新金额。 对于最新的日期,我实施如下

    A = LOAD '$input' AS (cid:chararray, date:chararray, amt:chararray,tid:chararray, time:chararray);
    B = FOREACH (GROUP A BY (cid,tid)) {
    sort = ORDER A BY date DESC;
    latest = LIMIT sort 1;
    GENERATE FLATTEN(newest);`enter code here`
   };'

但我想要最新的金额,因为我在同一天有多个记录,所以试着通过按时订购来获得金额,如下所示。

    AMT = FOREACH (GROUP B BY (cid,tid)){
    sort1 = ORDER B BY time DESC;
    lastamt = LIMIT sort1 1;
    GENERATE FLATTEN(lastamt.amt);
  };

I / p:

 9822736906^A2015-08-02^A146.08^A^A21:57:05.000000
 9822736906^A2015-08-02^A250.12^A58926968^A22:45:30.000000
 9822736906^A2015-08-02^A132.1^A00000000^A22:55:29.000000
 9822736906^A2015-08-02^A60.97^A00000000^A23:02:48.000000
 9826964132^A2015-08-05^A98.2^A^A23:05:46.000000
 9822736906^A2015-08-05^A85.71^A4F7581^A23:12:22.000000
 9822736906^A2015-08-05^A655.73^A00000000^A23:17:24.000000

O / p应该是

9822736906^A2015-08-05^A655.73^A00000000^A23:17:24.000000 
9826964132^A2015-08-05^A98.2^A^A23:05:46.000000

9822736906 ^ A2015-08-02 ^ A60.97 ^ A00000000 ^ A23:02:48.000000

1 个答案:

答案 0 :(得分:4)

如果目标是为cid选择最新记录,则下面的代码段将起作用。

在同一个ORDER BY运算符中按日期和时间按顺序排序。

输入:

9822736906  2015-08-02  146.08      21:57:05.000000
9822736906  2015-08-02  250.12  58926968    22:45:30.000000
9822736906  2015-08-02  132.1   00000000    22:55:29.000000
9822736906  2015-08-02  60.97   00000000    23:02:48.000000
9826964132  2015-08-05  98.2        23:05:46.000000
9822736906  2015-08-05  85.71   4F7581  23:12:22.000000
9822736906  2015-08-05  655.73  00000000    23:17:24.000000

猪脚本:

A = LOAD 'a.csv' USING PigStorage('\t') AS (cid:chararray, date:chararray, amt:chararray,tid:chararray, time:chararray);
B = GROUP A BY cid;
C = FOREACH B {
    sort = ORDER A BY date DESC, time DESC;
    latest = LIMIT sort 1;
    GENERATE FLATTEN(latest);
   };

输出:DUMP C:

(9822736906,2015-08-05,655.73,00000000,23:17:24.000000)
(9826964132,2015-08-05,98.2,,23:05:46.000000)