这对我来说是一项相当新的练习,但我需要找到一种方法来识别表格中的模式序列。 例如,假设我有一个类似于以下内容的简单表:
现在我想做的是识别并分组所有具有值为5,9和6的序列模式的记录,并在查询中显示它们。你会如何使用T-SQL完成这项任务?
结果应如下所示:
我已经找到了一些可能实现这一目标的潜在例子,但却无法找到真正有用的东西。
答案 0 :(得分:8)
您可以使用包含在CTE
中的以下查询,以便将序列号分配给序列中包含的值:
;WITH Seq AS (
SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
)
<强>输出:强>
v rn
-------
5 1
9 2
6 3
使用上面的CTE
,您可以识别孤岛,即包含整个序列的连续行的片段:
;WITH Seq AS (
SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
), Grp AS (
SELECT [Key], [Value],
ROW_NUMBER() OVER (ORDER BY [Key]) - rn AS grp
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Value = s.v
)
SELECT *
FROM Grp
<强>输出:强>
Key Value grp
-----------------
1 5 0
2 9 0
3 6 0
6 5 3
7 9 3
8 6 3
grp
字段可帮助您准确识别这些岛屿。
现在你需要做的就是过滤掉部分群体:
;WITH Seq AS (
SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
), Grp AS (
SELECT [Key], [Value],
ROW_NUMBER() OVER (ORDER BY [Key]) - rn AS grp
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Value = s.v
)
SELECT g1.[Key], g1.[Value]
FROM Grp AS g1
INNER JOIN (
SELECT grp
FROM Grp
GROUP BY grp
HAVING COUNT(*) = 3 ) AS g2
ON g1.grp = g2.grp
注意:此答案的初始版本使用INNER JOIN
至Seq
。如果表格包含5, 42, 9, 6
之类的值,则此功能无效,因为42
会将INNER JOIN
过滤掉,并且此序列被错误地识别为有效序列。感谢@HABO进行此编辑。
答案 1 :(得分:1)
不是很优化,但我认为是对方回答:
CREATE TABLE pattern (
rowID INT IDENTITY(1,1) PRIMARY KEY,
rowValue INT NOT NULL
);
INSERT INTO pattern (rowValue) VALUES (5);
INSERT INTO pattern (rowValue) VALUES (9);
INSERT INTO pattern (rowValue) VALUES (6);
SELECT * FROM pattern;
SELECT Trg.* FROM Keys Trg
INNER JOIN pattern Pt ON (Trg.fValue = Pt.rowValue)
INNER JOIN (
SELECT K.fKey - P.rowID AS X, COUNT(*) AS Xc FROM Keys K
LEFT JOIN pattern P ON (K.fValue = P.rowValue)
WHERE
(P.rowID IS NOT NULL)
GROUP BY K.fKey - P.rowID
HAVING COUNT(*) = (SELECT COUNT(*) FROM pattern)
) Z ON (Trg.fKey - Pt.rowID = Z.X);
我使用表格将其连接到主表格。我计算Key
和模式Key
之间的差异,我只显示差异匹配的行(以及模式表中差异匹配行的行数)。