我想根据UID的升序对ID和值列进行排名。一旦值列的值与先前值不同,则预期输出必须更改。排名必须在每个新ID上重新启动
UID ID Value Expected Output
1 1 0 1
2 1 0 1
3 1 1 2
4 1 1 2
5 1 1 2
6 1 0 3
7 1 1 4
8 1 0 5
9 1 0 5
10 1 0 5
11 2 1 1
12 2 1 1
13 2 0 2
14 2 0 2
15 2 1 3
这是我创建的样本数据集:
CREATE TABLE [dbo].[Data] (
[UID] [int] NOT NULL,
[ID] [int] NULL,
[Value] [int] NULL
);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (1, 1, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (2, 1, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (3, 1, 1);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (4, 1, 1);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (5, 1, 1);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (6, 1, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (7, 1, 1);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (8, 1, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (9, 1, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (10, 1, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (11, 2, 1);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (12, 2, 1);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (13, 2, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (14, 2, 0);
INSERT [dbo].[Data] ([UID], [ID], [Value]) VALUES (15, 2, 1);
答案 0 :(得分:5)
我认为,解决这个“缺岛”问题的最简单方法是使用select uid, id, value,
1 + sum(case when value <> lag_value then 1 else 0 end)
over(partition by id order by uid) grp
from (
select d.*, lag(value, 1, value) over(partition by id order by uid) lag_value
from data d
) d
order by uid
来检索“先前”值,然后使用窗口总和在每次值更改时增加。
id date variable value
1 2019 x 100
1 2019 y 50.5
1 2020 x 10.0
1 2020 y NA
uid | id | value | grp --: | -: | ----: | --: 1 | 1 | 0 | 1 2 | 1 | 0 | 1 3 | 1 | 1 | 2 4 | 1 | 1 | 2 5 | 1 | 1 | 2 6 | 1 | 0 | 3 7 | 1 | 1 | 4 8 | 1 | 0 | 5 9 | 1 | 0 | 5 10 | 1 | 0 | 5 11 | 2 | 1 | 1 12 | 2 | 1 | 1 13 | 2 | 0 | 2 14 | 2 | 0 | 2 15 | 2 | 1 | 3
答案 1 :(得分:2)
这是一个空白和孤岛的问题。我认为最简单的方法是使用行数差异方法:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UID) rn1,
ROW_NUMBER() OVER (PARTITION BY ID, [Value] ORDER BY UID) rn2
FROM Data
)
SELECT *, DENSE_RANK() OVER (PARTITION BY ID ORDER BY rn1 - rn2, [Value]) AS output
FROM cte
ORDER BY UID;