如何根据行日差异和分区对SQL中的列进行排名?

时间:2020-04-19 23:07:36

标签: mysql sql mysql-workbench

我试图基于行差<3在列上获取RANK()。

select hotel.*,
IFNULL(datediff(visit_date, lag(visit_date)
OVER (partition by hotel_id)), 0) as diff
from hotel;

我得到以下输出,

hotel_id customer_id  visit_date  diff
1            1        2020-01-01    0
1            2        2020-01-03    2
2            1        2020-01-01    0
2            2        2020-01-10    9
2            3        2020-01-14    4
3            1        2020-01-04    0
3            1        2020-01-11    7

我对RANK()部分感到困惑。

预期输出: 如果“天差”小于3,则为1,否则为2。如果下一个大于3天,则为3,依此类推

hotel_id customer_id  visit_date  rank
1            1        2020-01-01    1
1            2        2020-01-03    1
2            1        2020-01-01    1
2            2        2020-01-10    2
2            3        2020-01-14    3
3            1        2020-01-04    1
3            1        2020-01-11    2

3 个答案:

答案 0 :(得分:1)

您可以使用此查询来生成您的rank值。它使用了两个CTE,第一个生成每次访问的行号(基于每个酒店),第二个(递归)CTE生成rank值,则从第一个CTE开始遍历各行,并且仅当日期差超过2天时才递增rank

WITH RECURSIVE hotel_rows AS (
  SELECT hotel_id, customer_id, visit_date,
         ROW_NUMBER() OVER (PARTITION BY hotel_id ORDER BY visit_date) AS rn
  FROM hotel
  ORDER BY hotel_id, visit_date
),
ranks AS (
  SELECT hotel_id, customer_id, visit_date, rn, 1 AS `rank`
  FROM hotel_rows
  WHERE rn = 1
  UNION ALL
  SELECT h.hotel_id, h.customer_id, h.visit_date, h.rn,
         r.rank + (h.visit_date > r.visit_date + INTERVAL 2 DAY)
  FROM hotel_rows h
  JOIN ranks r ON h.hotel_id = r.hotel_id
              AND h.rn = r.rn + 1
)
SELECT SELECT hotel_id, customer_id, visit_date, `rank`
FROM ranks
ORDER BY hotel_id, visit_date

输出(对于我稍作扩展的演示):

hotel_id    customer_id     visit_date  rank
1           1               2020-01-01  1
1           2               2020-01-03  1
2           1               2020-01-01  1
2           2               2020-01-10  2
2           3               2020-01-14  3
2           1               2020-01-15  3
2           2               2020-01-20  4
3           1               2020-01-04  1
3           1               2020-01-11  2

Demo on dbfiddle

答案 1 :(得分:0)

如果要根据给定条件获得结果,则可以在SQL Server中尝试以下操作。这是Demo

select
  hotel_id, 
  customer_id, 
  visit_date,
  case 
    when days < 3 then 1
    else 2
  end as rnk
from
(
  select
    *,
    datediff(day, n_date, visit_date) as days
  from
  (
      select
        *,
        coalesce(lag(visit_date) over (partition by hotel_id order by visit_date), visit_date) as n_date

      from hotel
  ) val
)days

答案 2 :(得分:0)

我会这样表示:

select h.*,
       (case when lag(visit_date) over (partition by hotel_id order by visit_date) < visit_date - interval 3 day
             then 2 else 1
       end)
from hotel h;

编辑;

根据您的修改点,您想要基于日期差分配组,然后使用row_number()

select h.*,
       1 + sum( coalesce(visit_date > prev_vd + interval 3 day, 0) ) over (partition by hotel_id order by visit_date) as grp
from (select h.*,
             lag(visit_date) over (partition by hotel_id order by visit_date) as prev_vd
      from hotel h
     ) h;

Here是db <>小提琴。

相关问题