尝试简化没有UNION的SQL查询

时间:2018-11-20 17:25:58

标签: sql sql-server tsql

我很难解释,所以让我尝试提出我的问题。我有一个类似于以下表格:

 Source    Value    User
========  =======  ======
  old1       1      Phil
  new        2      Phil
  old2       3      Phil
  new        4      Phil
  old1       1      Mike
  old2       2      Mike
  new        1      Jeff
  new        2      Jeff

我需要做的是创建一个查询,该查询根据源和值获取用户的值。它应遵循以下规则:

  

对于每个用户,获取最高价值。但是,无视“新”   如果该用户存在'old1'或'old2',则为来源。

因此,基于这些规则,我的查询应从该表返回以下内容:

 Value    User
=======  ======
   3      Phil
   2      Mike
   2      Jeff

我提出了一个查询,该查询与要求的内容很接近:

SELECT      MAX([Value]), [User]
FROM
(
    SELECT  CASE [Source]
                WHEN 'old1' THEN 1
                WHEN 'old2' THEN 1
                WHEN 'new'  THEN 2
            END                 AS [SourcePriority],
            [Value],
            [User]
    FROM    #UserValues
) MainPriority
WHERE       [SourcePriority] = 1
GROUP BY    [User]
UNION
SELECT      MAX([Value]), [User]
FROM
(
    SELECT  CASE [Source]
                WHEN 'old1' THEN 1
                WHEN 'old2' THEN 1
                WHEN 'new'  THEN 2
            END                 AS [SourcePriority],
            [Value],
            [User]
    FROM    #UserValues
) SecondaryPriority
WHERE       [SourcePriority] = 2
GROUP BY    [User]

但是这将返回以下结果:

 Value    User
=======  ======
   3      Phil
   4      Phil
   2      Mike
   2      Jeff

显然,不需要Phil = 4的额外值。我应如何尝试解决此查询?我也知道这是一个非常复杂的解决方案,可以通过正确使用聚集来解决,但是我对聚集并不十分熟悉,因此我不得不求助于工会。本质上,我正在寻求帮助,以创建尽可能最简洁的解决方案。

如果有人想自己填充表以进行尝试,则为以下SQL代码:

CREATE TABLE #UserValues
(
    [Source] VARCHAR(10),
    [Value]  INT,
    [User]   VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new',  2, 'Phil'),
('old2', 3, 'Phil'),
('new',  4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new',  1, 'Jeff'),
('new',  2, 'Jeff')

5 个答案:

答案 0 :(得分:2)

您可以相当轻松地解决此问题,而无需借助窗口功能。在这种情况下,您需要在((不是新的)或(没有old1或old2条目))处获得最大值。

这是一个可以正确处理您的示例数据的查询:

SELECT
    MAX(U1.[Value]) as 'Value'
    ,U1.[User]
FROM
    #UserValues U1
WHERE
    U1.[Source] <> 'new' 
    OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]

答案 1 :(得分:1)

您可以将order byrow_number()一起使用优先级:

select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user] 
                            order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc 
                           );

但是,如果您只有source限制为3,那么您也可以:

. . . 
order by row_number() over (partition by [user] 
                            order by (case when source = 'new' then 2 else 1 end), value desc 
                           )

答案 2 :(得分:1)

with raw_data
      as (
    select row_number() over(partition by a.[user] order by a.value desc) as rnk
          ,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old 
          ,a.*
      from uservalues a
         )
        ,curated_data  
         as(select *
                  ,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
             from raw_data rd
            where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end 
           )
    select *
      from curated_data
     where rnk2=1

我正在做以下事情

  1. raw_data->首先,我根据每个用户的最大可用价值对这些值进行排名。我也可以检查用户是否在源列中有与old1或old2挂钩的记录

  2. curated_data-> i如果cnt_old> 0,则消除具有最高值(rnk = 1)的记录作为新记录。现在我也将(rnk2)记录排名为该结果集中可用的最高值。

  3. 我从curated_data(即rnk2 = 1)中选择了最高可用值

答案 3 :(得分:1)

我认为您应该考虑建立一个XREF表来定义哪个源是什么优先级,以便将来可能进行更复杂的优先级排序。我用一个临时表做到这一点:

CREATE TABLE #SourcePriority
(
    [Source]         VARCHAR(10),
    [SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1), 
('new',  2)

您还可以创建一个View,以将SourcePriority查找到原始表。我使用CTE +可能的实现方式,如何查找具有最高价值的最高优先级:

;WITH CTE as (
    SELECT s.[SourcePriority], u.[Value], u.[User]
    FROM   #UserValues as u
        INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
    SELECT MIN ([SourcePriority]) as [TopPriority], [User]
    FROM   cte
    GROUP BY [User]
    ) as s
    INNER JOIN cte as v
        ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]

答案 4 :(得分:0)

我想你想要

select top (1) with ties uv.*
from (select uv.*,
             sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
      from #UserValues uv
     ) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);