Question

我正在尝试使用MS Azure SQL使用先前的非空结果填充NULL结果。

鉴于以下查询，Status列是否不应该用先前的非空值替换NULL值？ Lag(...)的文档建议它将读取上一个结果行，因此athlete:7的'2015-04-03'应该复制2015-04-02中的值，而该值又是从{{1}中读取的}。但是在这种情况下，2015-04-01似乎是在获取表值，而不是合并的结果行值。知道为什么吗？

我见过alternative ways从上方执行此填充行为，但是他们并没有真正解释正在发生的事情，我正在努力理解它们，有人可以解释（例如我很愚蠢）如何获得所需的行为？

有帮助的是，当我们过渡到下一个运动者编号时，他们的第一天总会有状态。

Lag

Answer 1

lag()使用“之前”数据来计算lag()。它不使用上一行的计算结果。

请注意，在以下所有建议中，我都按athleteid进行分区。考虑到数据的性质，这似乎是合理的。

您真正想要的是：

select . . ., 
       LAG(Status IGNORE NULLS) over (partition by athleteid order by row_index)  
from @AvailabilityDates ;

您可以与Microsoft讨论实现此ISO / ANSI标准功能的情况。

缺少这一点，一种流行的方法使用CROSS APPLY：

select . . .,
       ad2.Status
from @AvailabilityDates ad OUTER APPLY
     (select top (1) status
      from @AvailabilityDates ad2
      where ad2.athleteid = ad.athleteid and
            ad2.status is not null and 
            ad2.row_index <= ad.row_index
      order by ad2.row_index desc
     ) ad2;

或者，如果连续只有一个或两个NULL，则可以扩展COALESCE()：

select . . .,
       coalesce(status,
                lag(status, 1) over (partition by athleteid order by row_index),
                lag(status, 2) over (partition by athleteid order by row_index),
                lag(status, 3) over (partition by athleteid order by row_index)
               )

Answer 2

任何人都可以解释（例如我很愚蠢）如何获得所需的行为吗？

我也对窗口功能感到困惑。而且经常退而求助于使用功能强大的（对我而言）易于理解的APPLY运算符。

use tempdb

drop table if exists t

create table t(row_index int, AthleteId int, AvailabilityDate datetime, Status int)

insert into t
(row_index   ,AthleteId   ,AvailabilityDate    ,Status)
values
(1  ,7 ,'2015-04-01',2    ),
(2  ,7 ,'2015-04-02',2     ),
(3  ,7 ,'2015-04-03',NULL ),
(4  ,7 ,'2015-04-04',NULL ),
(5  ,7 ,'2015-04-05',3     ),
(6  ,7 ,'2015-04-06',3     ),
(7  ,7 ,'2015-04-07',NULL ),
(8  ,7 ,'2015-04-08',NULL ),
(9  ,7 ,'2015-04-09',NULL ),
(10 ,9 ,'2015-04-01',2     ),
(11 ,9 ,'2015-04-02',2     ),
(12 ,9 ,'2015-04-03',NULL ),
(13 ,9 ,'2015-04-04',NULL ),
(14 ,9 ,'2015-04-05',NULL ),
(15 ,9 ,'2015-04-06',3     ),
(16 ,9 ,'2015-04-07',4     ),
(17 ,9 ,'2015-04-08',4     ),
(18 ,9 ,'2015-04-09',NULL ),
(19 ,9 ,'2015-04-10',NULL );


select t.*, s.status s2
from t 
outer apply 
 (
    select top 1 status 
    from t t2 
    where t2.AthleteId = t.AthleteId 
      and t2.row_index < t.row_index 
      and status is not null 
    order by row_index desc
  ) s

Answer 3

支持@jnevill，他在上面有关创建分区的评论为我指明了正确的方向。

我的解决方案是创建一个总计列_grp，该列将当前status与前一个_grp相加。因此，具有null的任何行都将具有与所有先前的null相同的_grp，直到最后一个非null status。然后，我们从基于status的分区中提取最大值_grp，但是在任何_grp分区中都应该只有一个非空的status，分区中的第一个（又名最后一个明确定义的status）。我希望这是有道理的：）

use tempdb

drop table if exists t

create table t(row_index int, AthleteId int, AvailabilityDate datetime, Status int)

insert into t
(row_index   ,AthleteId   ,AvailabilityDate    ,Status)
values
(1  ,7 ,'2015-04-01',2    ),
(2  ,7 ,'2015-04-02',2     ),
(3  ,7 ,'2015-04-03',NULL ),
(4  ,7 ,'2015-04-04',NULL ),
(5  ,7 ,'2015-04-05',3     ),
(6  ,7 ,'2015-04-06',3     ),
(7  ,7 ,'2015-04-07',NULL ),
(8  ,7 ,'2015-04-08',NULL ),
(9  ,7 ,'2015-04-09',NULL ),
(10 ,9 ,'2015-04-01',2     ),
(11 ,9 ,'2015-04-02',2     ),
(12 ,9 ,'2015-04-03',NULL ),
(13 ,9 ,'2015-04-04',NULL ),
(14 ,9 ,'2015-04-05',NULL ),
(15 ,9 ,'2015-04-06',3     ),
(16 ,9 ,'2015-04-07',4     ),
(17 ,9 ,'2015-04-08',4     ),
(18 ,9 ,'2015-04-09',NULL ),
(19 ,9 ,'2015-04-10',NULL );

-- this CTE sums the row by row status (anything with null status is the same _grp as the row above) which creates a suitable partition
   _partitioned (row_index, AthleteId, AvailabilityDate, Status, _grp) AS 
    (
    SELECT
        row_index,
        AthleteId,
        AvailabilityDate,
        Status,
        SUM(Status) OVER (ORDER BY row_index) as _grp
    FROM t
    )

SELECT 
    AthleteId, 
    SportId, 
    AvailabilityDate, 
    Status,
-- this gets the max status in the partition which we created above
    MAX(Status) over (partition by _partitioned.grp order by _partitioned.row_index) as DerivedStatus   
FROM 
    _partitioned
    ORDER BY row_index

跨多行的SQL Lag（）

3 个答案: