我很难获得最长的用户条纹的正确计数。条纹是连续的一天,每个用户都需要签到。
任何帮助将不胜感激。这是我的脚本和示例数据的摆弄:http://sqlfiddle.com/#!17/d2825/1/0
check_ins表:
user_id goal_id check_in_date
------------------------------------------
| colt | 40365fa0 | 2019-01-07 15:35:53
| colt | d31efe70 | 2019-01-11 15:35:52
| berry| be2fcd50 | 2019-01-12 15:35:51
| colt | e754d050 | 2019-01-13 15:17:16
| colt | 9c87a7f0 | 2019-01-14 15:35:54
| colt | ucgtdes0 | 2019-01-15 12:30:59
PostgreSQL脚本:
WITH dates(DATE) AS
(SELECT DISTINCT Cast(check_in_date AS DATE),
user_id
FROM check_ins),
GROUPS AS
(SELECT Row_number() OVER (
ORDER BY DATE) AS rn, DATE - (Row_number() OVER (ORDER BY DATE) * interval '1' DAY) AS grp, DATE, user_id
FROM dates)
SELECT Count(*) AS streak,
user_id
FROM GROUPS
GROUP BY grp,
user_id
ORDER BY 1 DESC;
当我运行上面的代码时,这就是我得到的:
streak user_id
--------------
4 colt
1 colt
1 berry
应该是什么。我也想只为每个用户获得最长的连胜纪录。
streak user_id
--------------
3 colt
1 berry
答案 0 :(得分:1)
首先,感谢您的小提琴脚本和示例数据。
您没有使用正确的row_number
来实现间隙和孤岛问题。它应该类似于以下查询您的数据集。最重要的是,要获得最高的连胜成绩,您需要在按组编号分组后使用DISTINCT ON
(查询中的grp
,我称之为seq
)
我希望您只希望每天查看用户数据的不同条目。我试图在with子句中稍作更改以反映相同的内容。
SELECT * FROM (
WITH check_ins_dt AS
( SELECT DISTINCT check_in_date::DATE as check_in_date,
user_id
FROM check_ins)
SELECT DISTINCT ON (user_id) COUNT(*) AS streak,user_id
FROM (
SELECT c.*,
ROW_NUMBER() OVER(
ORDER BY check_in_date
) - ROW_NUMBER() OVER(
PARTITION BY user_id
ORDER BY check_in_date
) AS seq
FROM check_ins_dt c
) s
GROUP BY user_id,
seq
ORDER BY user_id,
COUNT(*) DESC ) q order
by streak desc;
答案 1 :(得分:1)
在Postgres中,您可以这样写:
select distinct on (user_id) user_id, count(distinct check_in_date::date) as num_days
from (select ci.*,
dense_rank() over (partition by user_id order by check_in_date::date) as seq
from check_ins ci
) ci
group by user_id, check_in_date::date - seq * interval '1 day'
order by user_id, num_days desc;
Here是db <>小提琴。
这遵循与您的方法类似的逻辑,但是您的查询似乎比必要的更为复杂。确实使用了Postgres distinct on
功能,可以很方便地避免额外的子查询。