注意:我有一个有效的查询,但我正在寻找优化来在大型表上使用它。
假设我有一个这样的表:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
我希望所有行的id值为1,但有一个例外: 跳过值为1的组,该组直接跟随同一session_id中的值7。
基本上我会查找值为1的组,这些组直接跟随值7,受session_id限制,并忽略这些组。然后我显示所有剩余值1行。
显示id的所需输出:
5
6
7
11
13
我从this post获得了一些灵感,最后得到了这段代码:
declare @req_data table (
id int primary key identity,
session_id int,
value int
)
insert into @req_data(session_id, value) values (5, 7)
insert into @req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (5, 1) -- ignore this one too
insert into @req_data(session_id, value) values (5, 12)
insert into @req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (5, 1) -- show this too
insert into @req_data(session_id, value) values (5, 1) -- show this too
insert into @req_data(session_id, value) values (6, 7)
insert into @req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (6, 3)
insert into @req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (7, 7)
insert into @req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into @req_data(session_id, value) values (8, 2)
insert into @req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from @req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
这给出了期望的结果,但是有4个选择块我认为在大型表上使用效率太低。
有更清洁,更快捷的方法吗?
答案 0 :(得分:2)
以下内容适用于此。
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
@req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
结果...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
编辑:它如何以及为何有效...... 基本前提取自Itzik Ben-Gan's "The Last non NULL Puzzle"。
基本上,我们依赖于大多数人通常不会想到的两种不同的行为......
1)NULL + anything = NULL。 2)您可以将INT转换或转换为固定长度的BINARY数据类型,它将继续排序为INT(而不是像文本字符串那样排序)。
当在CTE中向查询中添加间歇性步骤时,这更容易看到......
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
结果...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
查看BinVal列,我们看到所有非[value] = 1行和NULLS的8字节十六进制值,其中[value] = 1 ...前4个字节是Id(用于排序)和第二个4个字节是[value](用于设置&#34;前一个非-1值&#34;或者将整个事物设置为NULL。
第二步是涂抹&#34;使用窗口框架MAX函数将非NULL值转换为NULL,由session_id分区并按id排序。
第三步是解析最后4个字节并将它们转换回INT数据类型(SecondHalfAsINT)并处理由于没有任何非前1值(ControlValue)而产生的任何空值。
由于我们无法在WHERE子句中引用窗口函数,因此我们必须将查询抛出到CTE中(派生表也可以正常工作),以便我们可以在where子句中使用新的ControlValue
答案 1 :(得分:0)
您可以使用以下查询:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from @req_data
得到:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
因此,此查询会检测属于同一组的连续1
个记录的岛,由第一个前一行{{1}指定}。
您可以再次使用窗口函数来检测所有value <> 1
个岛屿。如果你将它包装在第二个cte中,那么你可以通过过滤掉所有7
个岛来获得所需的结果:
7
答案 2 :(得分:0)
SELECT CRow.id
FROM @req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM @req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN @req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7