如何使用滞后函数跳过一行? PostgreSQL 9.3

时间:2018-01-05 02:46:33

标签: sql database postgresql

我目前正在尝试从列出用户上网行为的表中编写查询。该表看起来像下面的那个

**RecordID  RespondentID    DeviceID    UTCTimestamp    Domain**
1   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:21    goodreads.com 
2   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:21    goodreads.com 
3   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:21    gr-assets.com 
4   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:21    gr-assets.com 
5   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:23    itunes.apple.com 
6   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:23    itunes.apple.com 
7   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:51    samplicio.us 
8   01faca75-1216-4a55-b43c-9d64ade852f7    4DF57C06-F0BD-4779-8983-37A8B02E5EDF    06/11/2017 10:51    samplicio.us

感谢大家的帮助,我设法得到了这个。

RecordID RespondentID UTCTimestamp源域到域RecordID

2   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:21    goodreads.com   gr-assets.com   3
4   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:21    gr-assets.com   itunes.apple.com    5
6   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:23    itunes.apple.com    samplicio.us    7

To Domain是域名不同的下一行的值。

问题 虽然这看起来是正确的,但实际上我们已经跳过了整个第一条记录。这是因为,给定数据集,第一行“域”连接到第二行“域”,我们跳过它。第2行与第3行组合,因此第一个结果记录显示RecordID 2.我想进一步微调这一点。我的结果应该从RecordID 1开始并跳过RecordID 2,因为域是相同的,因此结果应该显示

RecordID RespondentID UTCTimestamp源域到域

1   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:21    goodreads.com   gr-assets.com   
3   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:21    gr-assets.com   itunes.apple.com    
5   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:23    itunes.apple.com    samplicio.us    

我尝试跳过RecordID 2,但是遇到了SQL错误'prev_nane'。

SELECT t1."RecordID", t1."RespondentID", t1."UTCTimestamp", t1."Domain" as "Source Domain", t2."Domain" as "To Domain" , t2."RecordID", lag(t1."Domain",1) over (order by t1."RecordID") as prev_name
from public."Traffic - Mobile" as t1
  join public."Traffic - Mobile" as t2 on t2."RespondentID" = t1."RespondentID" AND t2."DeviceID"=t1."DeviceID" AND t2."RecordID"=t1."RecordID"+1  And t1."Domain"<>T2."Domain" AND t2."UTCTimestamp">=t1."UTCTimestamp" AND t2."Sequence"-t1."Sequence"=1 and t1."RecordID"<13 AND t1."Domain"<>prev_name;

我做错了什么?

我想要实现的最终结果如下 RecordID RespondentID UTCTimestamp源域到域最终目的地

1   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:21    goodreads.com   gr-assets.com   samplicio.us
3   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:21    gr-assets.com   itunes.apple.com    samplicio.us
5   01faca75-1216-4a55-b43c-9d64ade852f7    06/11/2017 10:23    itunes.apple.com    samplicio.us    samplicio.us

另一列名为“最终目的地”。这是为了允许我将3个事务组合在一起作为到达samplicio.us的路径。

提前致谢。

1 个答案:

答案 0 :(得分:0)

试试这个:

with t1 as
(
select 
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
row_number() over (partition by 
                   respondentid,
                   deviceid 
                   order by 
                   utctimestamp,
                   recordid) as user_seq,
row_number() over (partition by 
                   respondentid,
                   deviceid,
                   domain
                   order by 
                   utctimestamp,
                   recordid) as user_domain_seq
from traffic_mobile)

select *
from
(
select 
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
lead(domain) over ( partition by 
                    respondentid,
                    deviceid order by 
                   user_seq) as next_domain,
last_value(domain) over( partition by
                      respondentid,
                      deviceid order by user_seq 
                      rows between unbounded preceding
                      and unbounded following ) 
                      as final_domain
from t1
where 
user_domain_seq = 1 ) t2
where t2.next_domain is not null

sqlfiddle:sqlfiddle.com/#!17/dc248/3

PS。对于在traffic_mobile表上只有1个条目的用户,查询不会返回一行。如果需要,则需要改进查询以包含它们。