蜂巢中两个记录之间的差异

时间:2019-07-12 06:35:22

标签: sql hadoop hive count

我有一个包含5列的表,我需要找到前两个记录的count列差。 我能够根据某些条件获得前两项记录。例如,

我的桌子看起来像:

name address count current_date_time
john LA      102    2019-07-12 12:24:38
peter MAC    105    2019-07-12 12:24:40
john  NY      210   2019-07-12 12:24:02
john  WD      18    2019-07-12 12:24:12

选择查询以获取前两行:

SELECT count 
FROM table_name 
WHERE name="john" 
ORDER BY current_date_time DESC LIMIT 2

返回如下:

count
102
18

但是我需要102和18之间的区别。

我该如何编写子查询?

2 个答案:

答案 0 :(得分:3)

应用lead()窗口分析函数来确定下一行的列值。

SELECT count - ld as "Difference"
  FROM
 (
  SELECT count, lead(count,1,0) over (order by current_date_time desc ) as ld,
         current_date_time 
    FROM table_name 
   WHERE name="john" 
  ORDER BY current_date_time DESC LIMIT 2
 ) q
ORDER BY q.current_date_time DESC LIMIT 1

lead(count,1,0)的位置1表示偏移量,即1后一行,而0表示默认值。

Demo in PostGres hive也具有相似的语法)

答案 1 :(得分:1)

使用超前或滞后分析功能来解决按某列排序的上一行/下一行:

例如:

with your_data as (
select stack(4,
'john'  ,'LA'  ,   102, '2019-07-12 12:24:38',
'peter' ,'MAC' ,   105, '2019-07-12 12:24:40',
'john'  ,'NY'  ,   210, '2019-07-12 12:24:02',
'john'  ,'WD'  ,   18 , '2019-07-12 12:24:12'
) as (name, address, count, current_date_time)
)

select prev_count-count from
(
select s.*, lag(count) over(partition by name order by current_date_time) prev_count,
       row_number() over(partition by name order by current_date_time desc) rn
  from your_data s 
  where name="john" 
)s where rn=2;

返回:

OK
192