滞后函数得到最后一个不同的值(红移)

时间:2017-06-20 06:48:15

标签: sql amazon-redshift lag

我有一个如下样本数据,想要获得所需的o / p,请帮助我一些想法。

我希望第3,第4行的prev_diff_value的o / p为 2015-01-01 00:00:00 ,而不是 2015-01-02 00:00: 00。

with dat as (
            select 1 as id,'20150101 02:02:50'::timestamp as dt union all
            select 1,'20150101 03:02:50'::timestamp union all
            select 1,'20150101 04:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:51'::timestamp union all
            select 1,'20150103 02:02:50'::timestamp union all
            select 2,'20150101 02:02:50'::timestamp union all
            select 2,'20150101 03:02:50'::timestamp union all
            select 2,'20150101 04:02:50'::timestamp union all
            select 2,'20150102 02:02:50'::timestamp union all
            select 1,'20150104 02:02:50'::timestamp
            )-- select * from dat
   select id , dt , lag(trunc(dt)) over(partition by id order by dt asc) prev_diff_value
   from dat
  order by id,dt desc
O/P : 
   id   dt                    prev_diff_value
   1    2015-01-04 02:02:50   2015-01-03 00:00:00
   1    2015-01-03 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:51   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-01 00:00:00

2 个答案:

答案 0 :(得分:2)

据我所知,您希望获取id分区中每个时间戳的先前不同日期。然后,我会针对lagid的唯一组合应用date,然后像这样加入原始数据集:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
,dat_unique_lag as (
    select *, lag(date) over(partition by id order by date asc) prev_diff_value
    from (
        select distinct id,trunc(dt) as date
        from dat
    )
)
select *
from dat
join dat_unique_lag
using (id)
where trunc(dat.dt)=dat_unique_lag.date
order by id,dt desc;

然而,这不是超级高效的。如果您的数据的性质是同一天您的时间戳数量有限,那么您可以使用如下条件语句延长滞后时间:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select id, dt,
case 
    when lag(trunc(dt)) over(partition by id order by dt asc)=trunc(dt)
    then case 
        when lag(trunc(dt),2) over(partition by id order by dt asc)=trunc(dt)
        then case
            when lag(trunc(dt),3) over(partition by id order by dt asc)=trunc(dt)
            then lag(trunc(dt),4) over(partition by id order by dt asc)
            else lag(trunc(dt),3) over(partition by id order by dt asc)
            end
        else lag(trunc(dt),2) over(partition by id order by dt asc)
        end
    else lag(trunc(dt)) over(partition by id order by dt asc)
end as prev_diff_value
from dat
order by id,dt desc;

基本上,你看一下之前的记录,如果它不适合你,那么你回头看那个记录之前的记录,依此类推,直到你找到正确的记录或用完你的陈述深度。在这里,直到第4条记录为止。

答案 1 :(得分:1)

这是一种看待问题的不同方式,虽然效率不高,但还是挺有趣的。

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select distinct
dat.id
,dat.dt
,last_value(dat2.d) over (partition by dat.id, dat.dt order by dat2.d asc rows between unbounded preceding and unbounded following) as prev_diff_value
from dat
left join (
    select distinct
    id
    ,trunc(dt) as d
    from dat) dat2 on dat.id = dat2.id and trunc(dat.dt) > dat2.d
order by 1,2,3;

这将绘制出不同的 id 和日期对,并仅在连接日期早于相关行的情况下将它们重新连接到数据集上。然后,last_value 函数将获取每行的最后一个值,并且 distinct 从输出中删除所有不相关的行。我知道这个问题已经有几年了 - 但我偶然发现了它并且玩得很开心。