Question

我试图写q查询基于值90的分区。下面是我的表

create table  #temp(StudentID char(2),    Status int) 
insert #temp  values('S1',75 ) 
insert #temp  values('S1',85 )
insert #temp  values('S1',90)
insert #temp  values('S1',85)
insert #temp  values('S1',83)
insert #temp  values('S1',90 ) 
insert #temp  values('S1',85)
insert #temp  values('S1',90)
insert #temp  values('S1',93 ) 
insert #temp  values('S1',93 ) 
insert #temp  values('S1',93 )

必须输出：

ID  Status  Result
S1  75      0
S1  85      0
S1  90      0
S1  85      1
S1  83      1
S1  90      1
S1  85      2
S1  90      2
S1  93      3
S1  93      3   
S1  93      3

请任何人都有基于分区状态ID 90的解决方案，结果应该是1,2,3 ..来自基于时间值90的增量

Answer 1

假设实际问题是“我如何找到递增值的范围/岛”，答案可以使用LAG将当前Status值与某个订单上的前一个值进行比较。如果之前的值为90，则表示您有一个新岛：

declare @temp table (ID int identity PRIMARY KEY, StudentID char(2),    Status int) 

insert into @temp (StudentID,Status)
values
('S1',75), 
('S1',85),
('S1',90),
('S1',85),
('S1',83),
('S1',90), 
('S1',85),
('S1',90),
('S1',93), 
('S1',93), 
('S1',93);

select 
    * ,
    case LAG(Status,1,0) OVER (PARTITION BY StudentID ORDER BY ID) 
        when 90 then 1 else 0 end as NewIsland
from @temp

返回：

+----+-----------+--------+-----------+
| ID | StudentID | Status | NewIsland |
+----+-----------+--------+-----------+
|  1 | S1        |     75 |         0 |
|  2 | S1        |     85 |         0 |
|  3 | S1        |     90 |         0 |
|  4 | S1        |     85 |         1 |
|  5 | S1        |     83 |         0 |
|  6 | S1        |     90 |         0 |
|  7 | S1        |     85 |         1 |
|  8 | S1        |     90 |         0 |
|  9 | S1        |     93 |         1 |
| 10 | S1        |     93 |         0 |
| 11 | S1        |     93 |         0 |
+----+-----------+--------+-----------+

您可以通过将当前值之前的所有NewIsland值相加来创建一个岛ID，使用带有OVER的ROWS子句的SUM：

with islands as ( select * , case LAG(Status,1,0) OVER (PARTITION BY StudentID ORDER BY ID) when 90 then 1 else 0 end as NewIsland from @temp ) select * , SUM(NewIsland) OVER (PARTITION BY StudentID ORDER BY ID ROWS UNBOUNDED PRECEDING) from islands

这会产生：

+----+-----------+--------+-----------+--------+ | ID | StudentID | Status | NewIsland | Result | +----+-----------+--------+-----------+--------+ | 1 | S1 | 75 | 0 | 0 | | 2 | S1 | 85 | 0 | 0 | | 3 | S1 | 90 | 0 | 0 | | 4 | S1 | 85 | 1 | 1 | | 5 | S1 | 83 | 0 | 1 | | 6 | S1 | 90 | 0 | 1 | | 7 | S1 | 85 | 1 | 2 | | 8 | S1 | 90 | 0 | 2 | | 9 | S1 | 93 | 1 | 3 | | 10 | S1 | 93 | 0 | 3 | | 11 | S1 | 93 | 0 | 3 | +----+-----------+--------+-----------+--------+
BTW这是一个更广泛的差距＆amp; SQL中的群岛问题。

<强>更新

LAG和OVER在所有受支持的SQL Server版本中都可用，即SQL Server 2012及更高版本。 OVER也可以在SQL Server 2008中使用，但不能在LAG中使用。在这些版本中，使用了不同的，较慢的技术来计算岛屿：The SQL of Gaps and Islands in Sequences

在大多数情况下，ROW_NUMBER（）用于计算行排序，这会产生一个额外的CTE。如果所需的排序与ID或任何其他唯一的递增列相同，则可以避免这种情况。以下查询返回与使用LAG的查询相同的结果：

select * , case when exists (select ID from @temp t1 where t1.StudentID=t2.StudentID and t1.ID=t2.ID-1 and t2.status=90) then 1 else 0 end as NewIsland from @temp t2

如果任何行具有相同的StudentID，状态90和ID或ROW_NUMBER少一行，即与LAG（，1）相同，则此查询返回1.

之后我们只需要SUM先前的值。虽然SUM OVER于2008年推出，但它仅支持PARTITION BY。我们需要使用另一个子查询：

;with islands as ( select * , case when exists (select ID from @temp t1 where t1.StudentID=t2.StudentID and t1.ID=t2.ID-1 and t2.status=90) then 1 else 0 end as NewIsland from @temp t2 ) select * , (select ISNULL(SUM(NewIsland),0) from islands i1 where i1.ID<i2.ID) AS Result from islands i2

对于ID小于当前值的行的所有NewIsland值进行求和。

<强>性能

所有这些子查询都会导致大量重复扫描。但令人惊讶的是，旧查询更快比使用LAG的查询更快，因为第一个查询必须多次订购临时结果并按状态过滤，执行计划成本为45％vs 55％。

添加索引时情况发生了巨大变化：

declare @temp table ( ID int identity PRIMARY KEY, StudentID char(2),    Status int, 
                      INDEX IX_TMP(StudentID,ID,Status))

多种类型消失，成本变为80％和20％。查询只扫描索引值一次，而不对中间结果进行排序。

子查询版本无法利用索引

更新2

uzi建议删除LAG并仅汇总到前一行会更好：

select * , 
       SUM(case when status =90 then 1 else 0 end) 
           OVER (PARTITION BY StudentID 
                 ORDER BY ID ROWS 
                 BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
from @temp;

在语义上，这是相同的 - 对于每一行找到所有先前的行，为90s计算1，为其他行计算0，并求和。

服务器在两种情况下都会生成类似的执行计划。 LAG版本使用两个流聚合运算符，而没有它的版本。此有限数据集的最终结果虽然基本相同。

对于较大的数据集，结果可能会有所不同，例如，如果服务器必须将数据假脱机到tempdb，因为它们不适合内存。

Answer 2

也许这不是一个很好的解决方案，但它确实有效。

SELECT StudentID ID 
, Marks Status
, CASE
WHEN Marks = 90 
THEN SUM(q) OVER(order by row) - 1 
ELSE SUM(q) OVER(order by row)
END Result 
FROM (  
    SELECT row_number() OVER(order by StudentID desc) row
        , *
        , CASE 
        WHEN Marks = 90 
        THEN 1 
        ELSE 0 
        END q
    FROM #temp
) a

Answer 3

您只需使用子查询

即可

select *,
        coalesce((select sum(case when Marks = 90 then 1 else 0 end) 
                  from table 
                  where studentid = t.studentid and 
                        ? < t.?) , 0) as Result
from table t;

但是，?（即id）指定了您的实际数据排序列

Partion基于指定值

3 个答案: