我有一些需要注意的值表:
| ID | AddedDate |
|---------|-------------|
| 1 | 2010-04-01 |
| 2 | 2010-04-01 |
| 3 | 2010-04-02 |
| 4 | 2010-04-02 |
| 5 | NULL | <----------- needs attention
| 6 | 2010-04-02 |
| 7 | 2010-04-03 |
| 8 | 2010-04-04 |
| 9 | 2010-04-04 |
| 2432659 | 2016-06-15 |
| 2432650 | 2016-06-16 |
| 2432651 | 2016-06-17 |
| 2432672 | 2016-06-18 |
| 2432673 | NULL | <----------- needs attention
| 2432674 | 2016-06-20 |
| 2432685 | 2016-06-21 |
我想选择AddedDate
为空的行,我想选择它周围的行。在这个示例问题中,说ID
为±3的行就足够了。这意味着我想:
| ID | AddedDate |
|---------|-------------|
| 2 | 2010-04-01 | ─╮
| 3 | 2010-04-02 | │
| 4 | 2010-04-02 | │
| 5 | NULL | ├──ID values ±3
| 6 | 2010-04-02 | │
| 7 | 2010-04-03 | │
| 8 | 2010-04-04 | ─╯
| 2432672 | 2016-06-18 | ─╮
| 2432673 | NULL | ├──ID values ±3
| 2432674 | 2016-06-20 | ─╯
注意:实际上它是一个9M行的表,需要注意15k。
首先,我创建一个查询来构建我有兴趣返回的范围:
SELECT
ID-3 AS [Low ID],
ID+3 AS [High ID]
FROM Items
WHERE AddedDate IS NULL
Low ID High ID
------- -------
2 8
2432670 2432676
所以我最初尝试使用它确实有效:
WITH dt AS (
SELECT ID-3 AS Low, ID+3 AS High
FROM Items
WHERE AddedDate IS NULL
)
SELECT * FROM Items
WHERE EXISTS(
SELECT 1 FROM dt
WHERE Items.ID BETWEEN dt.Low AND dt.High)
但是当我在真实数据上尝试时:
可能有一种更有效的方法。
答案 0 :(得分:4)
这是您使用移动max:
重写的现有逻辑WITH dt AS (
SELECT
ID, AddedDate,
-- check if there's a NULL within a range of +/- 3 rows
-- and remember it's ID
max(case when AddedDate is null then id end)
over (order by id
rows between 3 preceding and 3 following) as NullID
FROM Items
)
SELECT *
FROM dt
where id between NullID-3 and NullID+3
答案 1 :(得分:3)
以下是使用窗口子句的一种方法:
select i.*
from (select i.*,
count(*) over (order by id rows between 3 preceding and 1 preceding) as cnt_prec,
count(*) over (order by id rows between 1 following and 3 following) as cnt_foll,
count(addeddate) over (order by id rows between 3 preceding and 1 preceding) as cnt_ad_prec,
count(addeddate) over (order by id rows between 1 following and 3 following) as cnt_ad_foll
from items
) i
where cnt_ad_prec <> cnt_prec or
cnt_ad_foll <> cnt_foll or
addeddate is null;
order by id;
这将返回列中NULL
或NULL
的三行内的所有行。
与计数进行比较的必要性是避免最小和最大ID上的边缘问题。
答案 2 :(得分:3)
另一种方式:
SELECT i1.*
FROM Items i1, Items i2
WHERE i2.AddedDate IS NULL AND ABS(i1.ID - i2.ID) <= 3
我希望AddedDate
列上有索引。
答案 3 :(得分:1)
尝试与其他答案不同的方法...如何使用表变量来存储您想要的ID。然后你加入。我希望插件执行得足够快,然后SELECT可以利用Items中的聚簇索引。不幸的是,我没有你的数据来测试它的效率:
DECLARE @userData TABLE(
idInRange int NOT NULL
)
INSERT INTO @userData (idInRange)
SELECT DISTINCT i.Id + r
FROM Items i
CROSS JOIN (
SELECT -3 as r UNION ALL SELECT -2 as r UNION ALL SELECT -1 as r UNION ALL SELECT 0 as r UNION ALL
SELECT 1 as r UNION ALL SELECT 2 as r UNION ALL SELECT 3 as r
) yourRange
WHERE AddedDate IS NULL;
SELECT i.*
FROM @userData u
INNER JOIN Items i ON i.ID = u.idInRange
编辑在填充表变量时添加DISTINCT,以避免重复的行,以防万一有两个连续的NULL日期并且它们的id范围重叠