对此帖子的引用:link,我使用了@Gordon Linoff提供的答案:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
row_number() over (partition by taxi order by time) as seqnum,
row_number() over (partition by taxi, client order by time) as seqnum_c
from t
) t
group by t.taxi, t.client, (seqnum - seqnum_c)
having count(*) >= 2
)
group by taxi;
这样我得到的答案很完美:
Tom 3 (AA count as 1, AAA count as 1 and BB count as 1, so total of 3 count)
Bob 1
但是现在我想补充一个条件,即同一出租车连续两个客户之间的时间不应超过2小时。
我知道我可能应该再次使用row_number()并使用datediff计算时间差。但是我不知道在哪里添加以及如何做。
有什么建议吗?
答案 0 :(得分:0)
这需要更多逻辑。在这种情况下,我将使用lag()
来计算组:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
sum(case when prev_client = client and
prev_time > time - interval '2 hour'
then 1
else 0
end) over (partition by client order by time) as grp
from (select t.*,
lag(client) over (partition by taxi order by time) as prev_client,
lag(time) over (partition by taxi order by time) as prev_time
from t
) t
) t
group by t.taxi, t.client, grp
having count(*) >= 2
)
group by taxi;
注意:您没有指定数据库,因此它使用ISO / ANSI标准语法进行日期/时间比较。您可以根据实际数据库进行调整。