在特定值之后跳过连续的行

时间:2017-09-05 12:59:24

标签: sql sql-server tsql sql-server-2014

注意:我有一个有效的查询,但我正在寻找优化来在大型表上使用它。

假设我有一个这样的表:

id  session_id  value
1       5           7
2       5           1
3       5           1
4       5           12
5       5           1
6       5           1
7       5           1
8       6           7
9       6           1
10      6           3
11      6           1
12      7           7
13      8           1
14      8           2
15      8           3

我希望所有行的id值为1,但有一个例外: 跳过值为1的组,该组直接跟随同一session_id中的值7。

基本上我会查找值为1的组,这些组直接跟随值7,受session_id限制,并忽略这些组。然后我显示所有剩余值1行。

显示id的所需输出:

5
6
7
11
13

我从this post获得了一些灵感,最后得到了这段代码:

declare @req_data table (
    id int primary key identity,
    session_id int,
    value int
)

insert into @req_data(session_id, value) values (5, 7)
insert into @req_data(session_id, value) values (5, 1)  -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (5, 1)  -- ignore this one too
insert into @req_data(session_id, value) values (5, 12)
insert into @req_data(session_id, value) values (5, 1)  -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (5, 1)  -- show this too
insert into @req_data(session_id, value) values (5, 1)  -- show this too
insert into @req_data(session_id, value) values (6, 7)
insert into @req_data(session_id, value) values (6, 1)  -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (6, 3)
insert into @req_data(session_id, value) values (6, 1)  -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (7, 7)
insert into @req_data(session_id, value) values (8, 1)  -- new session_id, show this
insert into @req_data(session_id, value) values (8, 2)
insert into @req_data(session_id, value) values (8, 3)



select id
from (
    select session_id, id, max(skip) over (partition by grp) as 'skip'
    from (
        select tWithGroups.*,
            ( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
        from (
            select session_id, id, value,
                case
                    when lag(value) over (partition by session_id order by session_id) = 7
                        then 1
                    else 0
                end as 'skip'
            from @req_data
        ) as  tWithGroups
    ) as tWithSkipField
    where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id

这给出了期望的结果,但是有4个选择块我认为在大型表上使用效率太低。

有更清洁,更快捷的方法吗?

3 个答案:

答案 0 :(得分:2)

以下内容适用于此。

WITH
    cte_ControlValue AS (
        SELECT 
            rd.id, rd.session_id, rd.value,
            ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
        FROM
            @req_data rd
            CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
        )
SELECT 
    cv.id, cv.session_id, cv.value
FROM
    cte_ControlValue cv
WHERE 
    cv.value = 1
    AND cv.ControlValue <> 7;

结果...

id          session_id  value
----------- ----------- -----------
5           5           1
6           5           1
7           5           1
11          6           1
13          8           1

编辑:它如何以及为何有效...... 基本前提取自Itzik Ben-Gan's "The Last non NULL Puzzle"

基本上,我们依赖于大多数人通常不会想到的两种不同的行为......

1)NULL + anything = NULL。 2)您可以将INT转换或转换为固定长度的BINARY数据类型,它将继续排序为INT(而不是像文本字符串那样排序)。

当在CTE中向查询中添加间歇性步骤时,这更容易看到......

SELECT 
    rd.id, rd.session_id, rd.value, 
    bv.BinVal,
    SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
    SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
    ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
    #req_data rd
    CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)

结果...

id          session_id  value       BinVal             SmearedBinVal      SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1           5           7           0x0000000100000007 0x0000000100000007 7               7
2           5           1           NULL               0x0000000100000007 7               7
3           5           1           NULL               0x0000000100000007 7               7
4           5           12          0x000000040000000C 0x000000040000000C 12              12
5           5           1           NULL               0x000000040000000C 12              12
6           5           1           NULL               0x000000040000000C 12              12
7           5           1           NULL               0x000000040000000C 12              12
8           6           7           0x0000000800000007 0x0000000800000007 7               7
9           6           1           NULL               0x0000000800000007 7               7
10          6           3           0x0000000A00000003 0x0000000A00000003 3               3
11          6           1           NULL               0x0000000A00000003 3               3
12          7           7           0x0000000C00000007 0x0000000C00000007 7               7
13          8           1           NULL               NULL               NULL            999
14          8           2           0x0000000E00000002 0x0000000E00000002 2               2
15          8           3           0x0000000F00000003 0x0000000F00000003 3               3

查看BinVal列,我们看到所有非[value] = 1行和NULLS的8字节十六进制值,其中[value] = 1 ...前4个字节是Id(用于排序)和第二个4个字节是[value](用于设置&#34;前一个非-1值&#34;或者将整个事物设置为NULL。

第二步是涂抹&#34;使用窗口框架MAX函数将非NULL值转换为NULL,由session_id分区并按id排序。

第三步是解析最后4个字节并将它们转换回INT数据类型(SecondHalfAsINT)并处理由于没有任何非前1值(ControlValue)而产生的任何空值。

由于我们无法在WHERE子句中引用窗口函数,因此我们必须将查询抛出到CTE中(派生表也可以正常工作),以便我们可以在where子句中使用新的ControlValue

答案 1 :(得分:0)

您可以使用以下查询:

select id, session_id, value,
          coalesce(sum(case when value <> 1 then 1 end) 
                   over (partition by session_id order by id), 0) as grp
from @req_data

得到:

id  session_id  value   grp
----------------------------
1   5           7       1
2   5           1       1
3   5           1       1
4   5           12      2
5   5           1       2
6   5           1       2
7   5           1       2
8   6           7       1
9   6           1       1
10  6           3       2
11  6           1       2
12  7           7       1
13  8           1       0
14  8           2       1
15  8           3       2

因此,此查询会检测属于同一的连续1个记录的岛,由第一个前一行{{1}指定}。

您可以再次使用窗口函数来检测所有value <> 1个岛屿。如果你将它包装在第二个cte中,那么你可以通过过滤掉所有7个岛来获得所需的结果:

7

答案 2 :(得分:0)

SELECT CRow.id
FROM @req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM @req_data PRev WHERE PRev.Id < CRow.id  AND PRev.session_id = CRow.session_id AND  PRev.value <> 1 ) MaxPRow
LEFT JOIN @req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7