Hive Windowing附加输出

时间:2018-12-02 22:42:46

标签: hive

给出以下数据:

CREATE TABLE dat (dt STRING, uxt BIGINT, temp FLOAT, city STRING);
INSERT INTO dat VALUES
('1/1/2000 0:53', 946687980, 100, 'A'),
('1/1/2000 0:59', 946688340, 28.9, 'A'),
('1/1/2000 13:54', 946734840, -1, 'A'),
('1/1/2000 13:55', 946734900, 30.9, 'A'),
('1/1/2000 22:53', 946767180, 30.9, 'A'),
('1/1/2000 22:59', 946767540, 30, 'A'),
('1/2/2000 1:25', 946776300, 121, 'A'),
('1/2/2000 1:53', 946777980, 28.9, 'A'),
('1/2/2000 2:53', 946781580, 28.9, 'A'),
('1/3/2000 1:53', 946864380, 10, 'A'),
('1/3/2000 11:20', 946898400, 15.1, 'A'),
('1/3/2000 11:53', 946900380, 18, 'A'),
('1/3/2000 21:00', 946933200, 17.1, 'A'),
('1/3/2000 21:53', 946936380, 16, 'A');

我正在使用一些窗口函数来每24小时查找一次maxtemp,mintemp等:

select dt, uxt, maxtemp, mintemp, ABS(maxtemp - mintemp) as tempDiff, city
from(
select dt, uxt, max(temp) over (w) as maxtemp, min(temp) over (w) as 
mintemp, city
from dat
WINDOW w as (partition by city order by uxt range between CURRENT ROW and 
86400 FOLLOWING))t1
order by tempDiff desc;

这给了我以下输出(第一行):

dt                   uxt      maxtemp   mintemp tempdiff    city
2000-01-01 13:54    946734840   121.0   -1.0    122.0        A

我想在发生最大温度时将'dt'添加到输出中,并且努力寻找解决方案。

输出的第一行看起来像这样:

dt                   uxt      maxtemp   mintemp tempdiff    city   maxdt   
2000-01-01 13:54    946698780   121        -1    122         A     '2000-01-02 01:25'

使用first_value查询:

select dt
  ,uxt
  ,max(temp) over w as maxtemp
  ,min(temp) over w as mintemp
  ,abs(max(temp) over w - min(temp) over w) as tempDiff
  ,first_value(dt) over (w order by temp desc) as maxdt
  ,city
from dat
order by tempDiff desc
WINDOW w as (partition by city order by uxt 
         range between CURRENT ROW and 86400 FOLLOWING);

正在产生此输出(前两行):

dt               uxt        maxtemp mintemp tempdiff maxdt          city
1/1/2000 0:59    946688340  121.0   -1.0    122.0    1/2/2000 1:53  A
1/1/2000 0:53    946687980  121.0   -1.0    122.0    1/1/2000 0:53   A

它给出的最大温度不在24小时日期范围内。另外,第二行2000年1月1日0:53的温度不是121.0

1 个答案:

答案 0 :(得分:0)

这可以通过在内部查询中包含以下first_value窗口函数来实现。

first_value(dt) over (partition by city order by uxt,temp desc 
                      range between CURRENT ROW and 7200 FOLLOWING)

还请注意,查询可以简化为(取消子查询,因为在这种情况下无需使用子查询)

select dt
      ,uxt
      ,max(temp) over w as maxtemp
      ,min(temp) over w as mintemp
      ,abs(max(temp) over w - min(temp) over w) as tempDiff
      ,first_value(dt) over (w order by temp desc) as maxdt
      ,city
from dat
WINDOW w as (partition by city order by uxt 
             range between CURRENT ROW and 7200 FOLLOWING)