获得最长的用户条纹数

时间:2019-01-23 03:55:01

标签: sql postgresql

我很难获得最长的用户条纹的正确计数。条纹是连续的一天,每个用户都需要签到。

任何帮助将不胜感激。这是我的脚本和示例数据的摆弄:http://sqlfiddle.com/#!17/d2825/1/0

check_ins表:

user_id  goal_id   check_in_date
------------------------------------------      
| colt | 40365fa0 | 2019-01-07 15:35:53
| colt | d31efe70 | 2019-01-11 15:35:52
| berry| be2fcd50 | 2019-01-12 15:35:51
| colt | e754d050 | 2019-01-13 15:17:16
| colt | 9c87a7f0 | 2019-01-14 15:35:54
| colt | ucgtdes0 | 2019-01-15 12:30:59

PostgreSQL脚本:

    WITH dates(DATE) AS
      (SELECT DISTINCT Cast(check_in_date AS DATE),
                       user_id
       FROM check_ins),
         GROUPS AS
      (SELECT Row_number() OVER (
                                ORDER BY DATE) AS rn, DATE - (Row_number() OVER (ORDER BY DATE) * interval '1' DAY) AS grp, DATE, user_id
       FROM dates)
    SELECT Count(*) AS streak,
           user_id
    FROM GROUPS
    GROUP BY grp,
             user_id
    ORDER BY 1 DESC; 

当我运行上面的代码时,这就是我得到的:

 streak user_id
 --------------
 4      colt
 1      colt
 1      berry

应该是什么。我也想只为每个用户获得最长的连胜纪录。

 streak user_id
 --------------
 3      colt
 1      berry

2 个答案:

答案 0 :(得分:1)

首先,感谢您的小提琴脚本和示例数据。

您没有使用正确的row_number来实现间隙和孤岛问题。它应该类似于以下查询您的数据集。最重要的是,要获得最高的连胜成绩,您需要在按组编号分组后使用DISTINCT ON(查询中的grp,我称之为seq

我希望您只希望每天查看用户数据的不同条目。我试图在with子句中稍作更改以反映相同的内容。

SELECT * FROM (  
WITH check_ins_dt AS
      ( SELECT DISTINCT check_in_date::DATE as check_in_date,

                       user_id
       FROM check_ins) 
SELECT DISTINCT ON (user_id) COUNT(*) AS streak,user_id

FROM (
     SELECT c.*,
            ROW_NUMBER() OVER(
                 ORDER BY check_in_date
            ) - ROW_NUMBER() OVER(
                 PARTITION BY user_id
                 ORDER BY check_in_date
            ) AS seq
     FROM check_ins_dt c
) s
GROUP BY user_id,
         seq
ORDER BY user_id,
COUNT(*) DESC ) q order
     by streak desc;

Demo

答案 1 :(得分:1)

在Postgres中,您可以这样写:

select distinct on (user_id) user_id, count(distinct check_in_date::date) as num_days
from (select ci.*,
             dense_rank() over (partition by user_id order by check_in_date::date) as seq
      from check_ins ci
     ) ci
group by user_id, check_in_date::date - seq * interval '1 day'
order by user_id, num_days desc;

Here是db <>小提琴。

这遵循与您的方法类似的逻辑,但是您的查询似乎比必要的更为复杂。确实使用了Postgres distinct on功能,可以很方便地避免额外的子查询。