我试图基于行差<3在列上获取RANK()。
select hotel.*,
IFNULL(datediff(visit_date, lag(visit_date)
OVER (partition by hotel_id)), 0) as diff
from hotel;
我得到以下输出,
hotel_id customer_id visit_date diff
1 1 2020-01-01 0
1 2 2020-01-03 2
2 1 2020-01-01 0
2 2 2020-01-10 9
2 3 2020-01-14 4
3 1 2020-01-04 0
3 1 2020-01-11 7
我对RANK()部分感到困惑。
预期输出: 如果“天差”小于3,则为1,否则为2。如果下一个大于3天,则为3,依此类推
hotel_id customer_id visit_date rank
1 1 2020-01-01 1
1 2 2020-01-03 1
2 1 2020-01-01 1
2 2 2020-01-10 2
2 3 2020-01-14 3
3 1 2020-01-04 1
3 1 2020-01-11 2
答案 0 :(得分:1)
您可以使用此查询来生成您的rank
值。它使用了两个CTE
,第一个生成每次访问的行号(基于每个酒店),第二个(递归)CTE
生成rank
值,则从第一个CTE
开始遍历各行,并且仅当日期差超过2天时才递增rank
:
WITH RECURSIVE hotel_rows AS (
SELECT hotel_id, customer_id, visit_date,
ROW_NUMBER() OVER (PARTITION BY hotel_id ORDER BY visit_date) AS rn
FROM hotel
ORDER BY hotel_id, visit_date
),
ranks AS (
SELECT hotel_id, customer_id, visit_date, rn, 1 AS `rank`
FROM hotel_rows
WHERE rn = 1
UNION ALL
SELECT h.hotel_id, h.customer_id, h.visit_date, h.rn,
r.rank + (h.visit_date > r.visit_date + INTERVAL 2 DAY)
FROM hotel_rows h
JOIN ranks r ON h.hotel_id = r.hotel_id
AND h.rn = r.rn + 1
)
SELECT SELECT hotel_id, customer_id, visit_date, `rank`
FROM ranks
ORDER BY hotel_id, visit_date
输出(对于我稍作扩展的演示):
hotel_id customer_id visit_date rank
1 1 2020-01-01 1
1 2 2020-01-03 1
2 1 2020-01-01 1
2 2 2020-01-10 2
2 3 2020-01-14 3
2 1 2020-01-15 3
2 2 2020-01-20 4
3 1 2020-01-04 1
3 1 2020-01-11 2
答案 1 :(得分:0)
如果要根据给定条件获得结果,则可以在SQL Server中尝试以下操作。这是Demo
select
hotel_id,
customer_id,
visit_date,
case
when days < 3 then 1
else 2
end as rnk
from
(
select
*,
datediff(day, n_date, visit_date) as days
from
(
select
*,
coalesce(lag(visit_date) over (partition by hotel_id order by visit_date), visit_date) as n_date
from hotel
) val
)days
答案 2 :(得分:0)
我会这样表示:
select h.*,
(case when lag(visit_date) over (partition by hotel_id order by visit_date) < visit_date - interval 3 day
then 2 else 1
end)
from hotel h;
编辑;
根据您的修改点,您想要基于日期差分配组,然后使用row_number()
:
select h.*,
1 + sum( coalesce(visit_date > prev_vd + interval 3 day, 0) ) over (partition by hotel_id order by visit_date) as grp
from (select h.*,
lag(visit_date) over (partition by hotel_id order by visit_date) as prev_vd
from hotel h
) h;
Here是db <>小提琴。