需要猪拉丁文字的最大值时间戳

时间:2017-12-26 19:13:49

标签: apache-pig

输入数据如下:

SQL> with test as
  2  (select 'L1'  col from dual union
  3   select 'L2'  col from dual union
  4   select 'L3A' col from dual union
  5   select 'L3B' col from dual union
  6   select 'L4'  col from dual union
  7   select 'L6C' col from dual union
  8   select 'L8'  col from dual union
  9   select 'L9'  col from dual union
 10   select 'L10' col from dual union
 11   select 'L11' col from dual union
 12   select 'R1D' col from dual union
 13   select 'R2A' col from dual union
 14   select 'R2B' col from dual union
 15   select 'R2Z' col from dual union
 16   select 'R11' col from dual)
 17  select col from test
 18  order by
 19    substr(col, 1, 1),
 20    to_number(regexp_substr(col, '\d+', 1, 1)),
 21    regexp_substr(col, '\w', 1, 3) desc;

COL
---
L1
L2
L3B
L3A
L4
L6C
L8
L9
L10
L11
R1D
R2Z
R2B
R2A
R11

15 rows selected.

SQL>

脚本如下:

(1,a,1,2)
(2,a,2,4)
(5,a,7,5)
(6,a,3,1)
(8,a,4,3)
(3,a,8,6)
(7,a,5,8)
(4,a,6,7)

输出如下:

a =  load '/tmp/data/data' using PigStorage(',') as (timestamp:chararray,constant:chararray,data1:chararray,data2:chararray);
b = FOREACH (GROUP a BY(constant)){
ord4 = ORDER a BY timestamp DESC;
top4 = LIMIT ord4 1;
GENERATE FLATTEN(top4),MAX(a.data1) as data,MAX(a.data2) as data2;}
g4 = FOREACH b GENERATE top4::timestamp AS timestamp,
                   top4::constant AS constant,
                   top4::data1 AS curr_data1,
                   top4::data2 AS curr_data2,
                   data1 as data1,
                   data2 as data2;
dump g4;

还需要data1的时间戳为3,data2为7。

如下所示:

(8,a,4,3,8,8)

你能否告诉你如何实现这一目标?

非常感谢提前。

1 个答案:

答案 0 :(得分:0)

您只提供了6个字段 g4 ,因此输出((8,a,4,3),8,8))。

当你说

时请更具体

还需要data1的时间戳为3,data2为7。

最后两个字段的预期结果如何。