T-SQL:用最近的非空值替换NULL的最佳方法是什么?

时间:2018-01-05 02:55:28

标签: sql sql-server tsql

假设我有这张表:

+----+-------+
| id | value |
+----+-------+
|  1 |     5 |
|  2 |     4 |
|  3 |     1 |
|  4 |  NULL |
|  5 |  NULL |
|  6 |    14 |
|  7 |  NULL |
|  8 |     0 |
|  9 |     3 |
| 10 |  NULL |
+----+-------+

我想编写一个查询,用该表中表中的非空替换任何NULL值。

我想要这个结果:

+----+-------+
| id | value |
+----+-------+
|  1 |     5 |
|  2 |     4 |
|  3 |     1 |
|  4 |     1 |
|  5 |     1 |
|  6 |    14 |
|  7 |    14 |
|  8 |     0 |
|  9 |     3 |
| 10 |     3 |
+----+-------+

如果不存在先前的值,则NULL为OK。理想情况下,即使使用ORDER BY,这也应该能够正常工作。例如,如果我ORDER BY [id] DESC

+----+-------+
| id | value |
+----+-------+
| 10 |  NULL |
|  9 |     3 |
|  8 |     0 |
|  7 |     0 |
|  6 |    14 |
|  5 |    14 |
|  4 |    14 |
|  3 |     1 |
|  2 |     4 |
|  1 |     5 |
+----+-------+

如果我ORDER BY [value] DESC,那就更好了:

+----+-------+
| id | value |
+----+-------+
|  6 |    14 |
|  1 |     5 |
|  2 |     4 |
|  9 |     3 |
|  3 |     1 |
|  8 |     0 |
|  4 |     0 |
|  5 |     0 |
|  7 |     0 |
| 10 |     0 |
+----+-------+

认为这可能涉及某种分析功能 - 以某种方式对值列进行分区 - 但我不确定在哪里查看。

8 个答案:

答案 0 :(得分:2)

您可以使用运行总和来设置组,并使用max来填充空值。

select id,max(value) over(partition by grp) as value
from (select id,value,sum(case when value is not null then 1 else 0 end) over(order by id) as grp
      from tbl
     ) t

over()子句更改为order by value desc以获得问题的第二个结果。

答案 1 :(得分:2)

Itzik Ben-Gan在这里已经涵盖了最佳方式:The Last non NULL Puzzle

以下是针对1000万行的解决方案,并在我的系统上在20秒内完成

SELECT
  id,
  value1,
  CAST(
  SUBSTRING(
  MAX(CAST(id AS binary(4)) + CAST(value1 AS binary(4)))
  OVER (ORDER BY id
  ROWS UNBOUNDED PRECEDING),
  5, 4)
  AS int) AS lastval
FROM dbo.T1;

此解决方案假定您的id列已编入索引

答案 2 :(得分:0)

如果NULL分散,我使用WHILE循环来填充它们

但是如果NULL在更长的连续字符串中,则有更快的方法。

所以这是一种方法:

首先找到我们想要更新的记录。它在此记录中为NULL,在先前记录中没有NULL

SELECT C.VALUE, N.ID  
FROM TABLE C
INNER JOIN TABLE N
ON C.ID + 1 = N.ID
WHERE C.VALUE IS NOT NULL
AND N.VALUE IS NULL;

使用它来更新:(在这种语法上有点朦胧,但你明白了)

UPDATE N
SET VALUE = C.Value
FROM TABLE C
INNER JOIN TABLE N
ON C.ID + 1 = N.ID
WHERE C.VALUE IS NOT NULL
AND N.VALUE IS NULL;

..现在只是继续这样做,直到你用完行

-- This is needed to set @@ROWCOUNT to non zero
SELECT 1;


WHILE @@ROWCOUNT <> 0
BEGIN 

UPDATE N
SET VALUE = C.Value
FROM TABLE C
INNER JOIN TABLE N
ON C.ID + 1 = N.ID
WHERE C.VALUE IS NOT NULL
AND N.VALUE IS NULL;

END

另一种方法是使用类似的查询来获取要更新的id的范围。如果你的NULLS通常是针对连续的id

,那么很多会更快地工作

答案 3 :(得分:0)

以下是使用OUTER APPLY

的一种简单方法
CREATE TABLE #table(id INT, value INT)
INSERT INTO #table VALUES 
(1,5),
(2,4),
(3,1),
(4,NULL),
(5,NULL),
(6,14),
(7,NULL),
(8,0),
(9,3),
(10,NULL)

SELECT t.id, ISNULL(t.value, t3.value) value
FROM #table t
OUTER APPLY(SELECT id FROM #table WHERE id = t.id AND VALUE IS NULL) t2
OUTER APPLY(SELECT TOP 1 value 
            FROM #table WHERE id <= t2.id AND VALUE IS NOT NULL ORDER BY id DESC) t3

<强>输出:

id  VALUE
---------
1   5
2   4
3   1
4   1
5   1
6   14
7   14
8   0
9   3
10  3

答案 4 :(得分:0)

使用此样本数据:

if object_id('tempdb..#t1') is not null drop table #t1;
create table #t1 (id int primary key, [value] int null);
insert #t1 values(1,5),(2,4),(3,1),(4,NULL),(5,NULL),(6,14),(7,NULL),(8,0),(9,3),(10,NULL);

我想出了:

with x(id, [value], grouper) as (
select *, row_number() over (order by id)-sum(iif([value] is null,1,0)) over (order by id)
from #t1)
select id, min([value]) over (partition by grouper)
from x;
然而,我注意到,Vamsi Prabhala打败了我...我的解决方案与他发布的内容相同。 (arghhhh!)。所以我想我会尝试递归解决方案。这是一个非常有效的递归cte使用( ,只要ID被编入索引 ):

with sorted as (select *, seqid = row_number() over (order by id) from #t1),
firstRecord as (select top(1) * from #t1 order by id),
prev as
(
  select t.id, t.[value], lastid = 1, lastvalue = null
  from sorted t
  where t.id = 1
  union all
  select t2.id, t2.[value], lastid+1, isnull(prev.[value],lastvalue)
  from sorted t2
  join prev on t2.id = prev.lastid+1
)
select id, [value]=isnull([value],lastvalue)--, *
from prev;

通常我不喜欢递归cte(简称rCte),但在这种情况下,它提供了一个优雅的解决方案,并且比使用窗口聚合函数更快(总和,最小...) 。注意执行计划,底部的rcte。 rCTE通过两次索引搜索完成,其中一次仅用于一行。与窗口聚合解决方案不同,rcte不需要排序。使用statistics io on运行此操作; rcte产生的IO要少得多。

enter image description here

所有这些都说,不使用这些解决方案中的任何一个, TheGameiswar发布的内容将会表现最佳 。他在正确索引的id列上的解决方案将快速闪电。

答案 5 :(得分:0)

不要担心......给你的答案是:)

SELECT *
INTO   #TempIsNOtNull
FROM   YourTable
WHERE  value IS NOT NULL


SELECT *
INTO   #TempIsNull
FROM   YourTable
WHERE  value IS NULL


UPDATE YourTable
SEt           YourTable.value      =      UpdateDtls.value
FROM   YourTable
JOIN   (
          SELECT OuterTab1.id,
                       #TempIsNOtNull.value
          FROM   #TempIsNull  OuterTab1
          CROSS  JOIN #TempIsNOtNull
          WHERE  OuterTab1.id - #TempIsNOtNull.id > 0
                 AND (OuterTab1.id - #TempIsNOtNull.id)  = ( SELECT  TOP 1 
    OuterTab1.id - #TempIsNOtNull.id

   FROM       #TempIsNull  InnerTab

 CROSS       JOIN #TempIsNOtNull

 WHERE       OuterTab1.id - #TempIsNOtNull.id > 0

 AND OuterTab1.id     =      InnerTab.id

 ORDER BY (OuterTab1.id - #TempIsNOtNull.id) ASC) ) AS UpdateDtls
 ON     (YourTable.id   =  UpdateDtls.id)

答案 6 :(得分:0)

您也可以尝试使用correlated子查询

select id,
       case when value is not null then value else
       (select top 1 value from table 
        where id < t.id and value is not null  order by id desc) end value  
from table t

结果:

id  value
1   5
2   4
3   1
4   1
5   1
6   14
7   14
8   0
9   3
10  3

答案 7 :(得分:0)

可以使用UPDATE语句,请在使用前进行测试

update #table
set value = newvalue
from (
    select 
    s.id, s.value,
    (select top 1 t.value from #table t where t.id <= s.id and t.value is not null order by t.id desc) as newvalue
    from #table S
) u
where #table.id = u.id and #table.value is null