配置单元窗口功能:上一个分区的最后一个值

时间:2018-09-03 09:52:00

标签: sql hive

使用Hive窗口函数,我想获取上一个分区的最后一个值:

SELECT 
  name, 
  rank, 
  first_value(rank over(partition by type order by rank)) as new_rank 
FROM my_table

以下查询:

| name | rank | type | new_rank |
| one  | 1    | T1   |   1      |
| two  | 2    | T2   |   2      |
| thr  | 3    | T2   |   2      |
| fou  | 4    | T1   |   4      |
| fiv  | 5    | T2   |   5      |
| six  | 6    | T2   |   5      |
| sev  | 7    | T2   |   5      |

会给:

| name | rank | type | new_rank |
| one  | 1    | T1   |   NULL   |
| two  | 2    | T2   |   1      |
| thr  | 3    | T2   |   1      |
| fou  | 4    | T1   |   3      |
| fiv  | 5    | T2   |   4      |
| six  | 6    | T2   |   4      |
| sev  | 7    | T2   |   4      |

但是我需要的是“上一个分区的最后一个值”:

this.state = { registered: false };

1 个答案:

答案 0 :(得分:0)

这似乎很棘手。这是“群岛”的一种变体。这是想法:

  1. 标识类型相同的“岛”(使用行号的不同)。
  2. 然后使用lag()将先前的等级引入该岛。
  3. 进行一次分钟扫描以获取所需的新排名。

所以:

with gi as (
      select t.*,
             (seqnum - seqnum_t) as grp
      from (select t.*,
                   row_number() over (partition by type order by rank) as seqnum_t,
                   row_number() over (order by rank) as seqnum
            from t
           ) t
      ),
      gi2 as (
       select gi.*, lag(rank) over (order by gi.rank) as prev_rank
       from gi
      )
select gi2.*,
       min(prev_rank) over (partition by type, grp) as new_rank
from gi2
order by rank;

Here是一个SQL提琴(尽管使用Postgres)。