我目前正在尝试从列出用户上网行为的表中编写查询。该表看起来像下面的那个
**RecordID RespondentID DeviceID UTCTimestamp Domain**
1 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 goodreads.com 2 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 goodreads.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 gr-assets.com 4 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:21 gr-assets.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:23 itunes.apple.com 6 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:23 itunes.apple.com 7 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:51 samplicio.us 8 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 06/11/2017 10:51 samplicio.us
感谢大家的帮助,我设法得到了这个。
RecordID RespondentID UTCTimestamp源域到域RecordID
2 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com 3 4 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com 5 6 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us 7
To Domain是域名不同的下一行的值。
问题 虽然这看起来是正确的,但实际上我们已经跳过了整个第一条记录。这是因为,给定数据集,第一行“域”连接到第二行“域”,我们跳过它。第2行与第3行组合,因此第一个结果记录显示RecordID 2.我想进一步微调这一点。我的结果应该从RecordID 1开始并跳过RecordID 2,因为域是相同的,因此结果应该显示
RecordID RespondentID UTCTimestamp源域到域
1 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us
我尝试跳过RecordID 2,但是遇到了SQL错误'prev_nane'。
SELECT t1."RecordID", t1."RespondentID", t1."UTCTimestamp", t1."Domain" as "Source Domain", t2."Domain" as "To Domain" , t2."RecordID", lag(t1."Domain",1) over (order by t1."RecordID") as prev_name
from public."Traffic - Mobile" as t1
join public."Traffic - Mobile" as t2 on t2."RespondentID" = t1."RespondentID" AND t2."DeviceID"=t1."DeviceID" AND t2."RecordID"=t1."RecordID"+1 And t1."Domain"<>T2."Domain" AND t2."UTCTimestamp">=t1."UTCTimestamp" AND t2."Sequence"-t1."Sequence"=1 and t1."RecordID"<13 AND t1."Domain"<>prev_name;
我做错了什么?
我想要实现的最终结果如下 RecordID RespondentID UTCTimestamp源域到域最终目的地
1 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 goodreads.com gr-assets.com samplicio.us 3 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:21 gr-assets.com itunes.apple.com samplicio.us 5 01faca75-1216-4a55-b43c-9d64ade852f7 06/11/2017 10:23 itunes.apple.com samplicio.us samplicio.us
另一列名为“最终目的地”。这是为了允许我将3个事务组合在一起作为到达samplicio.us的路径。
提前致谢。
答案 0 :(得分:0)
试试这个:
with t1 as
(
select
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
row_number() over (partition by
respondentid,
deviceid
order by
utctimestamp,
recordid) as user_seq,
row_number() over (partition by
respondentid,
deviceid,
domain
order by
utctimestamp,
recordid) as user_domain_seq
from traffic_mobile)
select *
from
(
select
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
lead(domain) over ( partition by
respondentid,
deviceid order by
user_seq) as next_domain,
last_value(domain) over( partition by
respondentid,
deviceid order by user_seq
rows between unbounded preceding
and unbounded following )
as final_domain
from t1
where
user_domain_seq = 1 ) t2
where t2.next_domain is not null
sqlfiddle:sqlfiddle.com/#!17/dc248/3
PS。对于在traffic_mobile表上只有1个条目的用户,查询不会返回一行。如果需要,则需要改进查询以包含它们。