我有一张包含用户和日期值的表格。对于每个用户,可以有多个日期值。在下面的脚本中,我根据我在stackoverflow找到的另一个答案,在select,for user期间插入了SincePrevious和sinceFirst列。
SELECT
a.user as 'user'
,a.date as 'date'
,ISNULL(DATEDIFF(day,b.date,a.date),0) as 'sincePrevious'
,datediff(day, min(a.date) over (partition by a.user), a.date) as 'sinceFirst'
FROM
(select *,ROW_NUMBER() OVER(PARTITION BY user ORDER BY date) as Rank from HUT_regels) as a
LEFT JOIN
(select *,ROW_NUMBER() OVER(PARTITION BY user ORDER BY date) as Rank from HUT_regels) as b
ON a.user = b.user AND a.Rank = b.Rank + 1
ORDER by 'user', 'date'
我想要做的是以类似的方式插入另一列(在选择期间),根据用户值和两行之间的时间差添加唯一的组ID。在示例中,我添加了一些groupIds。如果同一用户的两个连续日期之间的日期差异大于50天(在此示例中),则必须将其视为出现的新“序列”。 groupIDs 2,3和4反映了这一点。
user date sincePrevious sinceFirst groupId
100000029 25-05-2012 0 0 1
100002161 08-01-2012 0 0 2
100002161 04-02-2012 27 27 2
100002161 15-02-2012 11 38 2
100002161 28-03-2012 42 80 2
100002161 23-05-2012 56 136 3
100002161 11-07-2012 49 185 3
100002161 29-08-2012 49 234 3
100002161 24-10-2012 56 290 4
100002161 21-11-2012 28 318 4
100005242 07-05-2013 0 0 5
100005242 10-05-2013 3 3 5
100005242 14-05-2013 4 7 5
100005242 17-05-2013 3 10 5
100005242 21-05-2013 4 14 5
100005242 24-05-2013 3 17 5
100005242 28-05-2013 4 21 5
100005242 07-06-2013 10 31 5
...
groupIds应该是唯一的,但不必是连续的或偶数。
我知道CTE可以做到这一点,但我想找到一个类似于SincePrevious和sinceFirst方式的解决方案。
我的用例是SQL Server,但是更通用的解决方案(我提到MySQL,但PostgreSQL也很好)也可以帮助其他人。
答案 0 :(得分:0)
首先,您应该使用lag()
和min()
来获取值:
select r.*,
datediff(day, lag(date) over (partition by user order by date), date) as sincePrevious,
datediff(day, min(date) over (partition by user), date) as sinceFirst
from HUT_regels r;
要添加GroupId
,您只需要一个子查询和条件聚合:
select r.*,
sum(case when sincePrevious < 0 then 0 else 1 end) over
(partition by user order by date) as groupId
from (select r.*,
datediff(day, lag(date) over (partition by user order by date), date) as sincePrevious,
datediff(day, min(date) over (partition by user), date) as sinceFirst
from HUT_regels r
) r;
这是所有ANSI标准功能。但是,直到2012版本才在SQL Server中完全引入它。在早期版本中,您可以使用apply
代替。