Question

对此帖子的引用：link，我使用了@Gordon Linoff提供的答案：

    select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
      from (select t.*,
                   row_number() over (partition by taxi order by time) as seqnum,
                   row_number() over (partition by taxi, client order by time) as seqnum_c
            from t
           ) t
      group by t.taxi, t.client, (seqnum - seqnum_c)
      having count(*) >= 2
    )
group by taxi;

这样我得到的答案很完美：

Tom    3  (AA count as 1, AAA count as 1 and BB count as 1, so total of 3 count)
Bob    1

但是现在我想补充一个条件，即同一出租车连续两个客户之间的时间不应超过2小时。

我知道我可能应该再次使用row_number（）并使用datediff计算时间差。但是我不知道在哪里添加以及如何做。

有什么建议吗？

Answer 1

这需要更多逻辑。在这种情况下，我将使用lag()来计算组：

select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
      from (select t.*,
                   sum(case when prev_client = client and 
                                 prev_time > time - interval '2 hour'
                            then 1
                            else 0
                       end) over (partition by client order by time) as grp
            from (select t.*,
                         lag(client) over (partition by taxi order by time) as prev_client,
                         lag(time) over (partition by taxi order by time) as prev_time
                  from t
                 ) t
           ) t
      group by t.taxi, t.client, grp
      having count(*) >= 2
    )
group by taxi;

注意：您没有指定数据库，因此它使用ISO / ANSI标准语法进行日期/时间比较。您可以根据实际数据库进行调整。

计算带有时间戳记间隔要求的连续记录

1 个答案: