我很难解释,所以让我尝试提出我的问题。我有一个类似于以下表格:
Source Value User
======== ======= ======
old1 1 Phil
new 2 Phil
old2 3 Phil
new 4 Phil
old1 1 Mike
old2 2 Mike
new 1 Jeff
new 2 Jeff
我需要做的是创建一个查询,该查询根据源和值获取用户的值。它应遵循以下规则:
对于每个用户,获取最高价值。但是,无视“新” 如果该用户存在'old1'或'old2',则为来源。
因此,基于这些规则,我的查询应从该表返回以下内容:
Value User
======= ======
3 Phil
2 Mike
2 Jeff
我提出了一个查询,该查询与要求的内容很接近:
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) MainPriority
WHERE [SourcePriority] = 1
GROUP BY [User]
UNION
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) SecondaryPriority
WHERE [SourcePriority] = 2
GROUP BY [User]
但是这将返回以下结果:
Value User
======= ======
3 Phil
4 Phil
2 Mike
2 Jeff
显然,不需要Phil = 4的额外值。我应如何尝试解决此查询?我也知道这是一个非常复杂的解决方案,可以通过正确使用聚集来解决,但是我对聚集并不十分熟悉,因此我不得不求助于工会。本质上,我正在寻求帮助,以创建尽可能最简洁的解决方案。
如果有人想自己填充表以进行尝试,则为以下SQL代码:
CREATE TABLE #UserValues
(
[Source] VARCHAR(10),
[Value] INT,
[User] VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new', 2, 'Phil'),
('old2', 3, 'Phil'),
('new', 4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new', 1, 'Jeff'),
('new', 2, 'Jeff')
答案 0 :(得分:2)
您可以相当轻松地解决此问题,而无需借助窗口功能。在这种情况下,您需要在((不是新的)或(没有old1或old2条目))处获得最大值。
这是一个可以正确处理您的示例数据的查询:
SELECT
MAX(U1.[Value]) as 'Value'
,U1.[User]
FROM
#UserValues U1
WHERE
U1.[Source] <> 'new'
OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]
答案 1 :(得分:1)
您可以将order by
和row_number()
一起使用优先级:
select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user]
order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc
);
但是,如果您只有source
限制为3,那么您也可以:
. . .
order by row_number() over (partition by [user]
order by (case when source = 'new' then 2 else 1 end), value desc
)
答案 2 :(得分:1)
with raw_data
as (
select row_number() over(partition by a.[user] order by a.value desc) as rnk
,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old
,a.*
from uservalues a
)
,curated_data
as(select *
,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
from raw_data rd
where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end
)
select *
from curated_data
where rnk2=1
我正在做以下事情
raw_data->首先,我根据每个用户的最大可用价值对这些值进行排名。我也可以检查用户是否在源列中有与old1或old2挂钩的记录
curated_data-> i如果cnt_old> 0,则消除具有最高值(rnk = 1)的记录作为新记录。现在我也将(rnk2)记录排名为该结果集中可用的最高值。
我从curated_data(即rnk2 = 1)中选择了最高可用值
答案 3 :(得分:1)
我认为您应该考虑建立一个XREF表来定义哪个源是什么优先级,以便将来可能进行更复杂的优先级排序。我用一个临时表做到这一点:
CREATE TABLE #SourcePriority
(
[Source] VARCHAR(10),
[SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1),
('new', 2)
您还可以创建一个View,以将SourcePriority查找到原始表。我使用CTE +可能的实现方式,如何查找具有最高价值的最高优先级:
;WITH CTE as (
SELECT s.[SourcePriority], u.[Value], u.[User]
FROM #UserValues as u
INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
SELECT MIN ([SourcePriority]) as [TopPriority], [User]
FROM cte
GROUP BY [User]
) as s
INNER JOIN cte as v
ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]
答案 4 :(得分:0)
我想你想要
select top (1) with ties uv.*
from (select uv.*,
sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
from #UserValues uv
) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);