我正在尝试在两个表上执行类似于基于列的交叉。 表格是:
LogTag
:日志可以包含零个或多个标记MatchingRule
:匹配规则由一个或多个定义规则的标记日志可以包含零个或多个匹配的规则。我将传递MatchingRuleID
并期望返回符合该规则的所有日志。
预期结果:匹配LogID
的结果集。例如。传递MatchingRuleID = 30
应该返回LogID
101. MatchingRuleID = 31
应该返回LogID
101& 100.
此外,LogTag
表可能有数百万行,因此首选高效查询。
问题:如何查找与指定规则定义匹配的所有LogID
?
架构:
CREATE TABLE dbo.Tag
(
TagID INT,
TagName NVARCHAR(50)
)
INSERT INTO dbo.Tag (TagID, TagName)
VALUES (1, 'tag1'), (2, 'tag2'), (3, 'tag3')
CREATE TABLE dbo.LogTag
(
LogID INT,
TagID INT
)
INSERT INTO dbo.LogTag (LogID, TagID)
VALUES (100, 1), (101, 1), (101, 2), (101, 3), (101, 4), (102, 2), (102, 3)
CREATE TABLE dbo.MatchingRule
(
MatchingRuleID INT,
TagID INT
)
INSERT INTO dbo.MatchingRule (MatchingRuleID, TagID)
VALUES (30, 1), (30, 2), (30, 3), (31, 1)
答案 0 :(得分:2)
在表上使用正确的聚簇索引很重要。我在#log_tag
的注释中添加了一个替代索引,这可能会提高大型集的性能。由于我没有适当的样本进行测试,因此您必须验证哪个是最佳的。
CREATE TABLE #tag(tag_id INT PRIMARY KEY,tag_name NVARCHAR(50));
INSERT INTO #tag (tag_id,tag_name)VALUES
(1,'tag1'),(2,'tag2'),(3,'tag3');
-- Try this key for large sets: PRIMARY KEY(tag_id,log_id));
CREATE TABLE #log_tag(log_id INT,tag_id INT,PRIMARY KEY(log_id,tag_id))
INSERT INTO #log_tag (log_id,tag_id)VALUES
(100,1),(101,1),(101,2),(101,3),(101,4),(102,2),(102,3);
CREATE TABLE #matching_rule(matching_rule_id INT,tag_id INT,PRIMARY KEY(matching_rule_id,tag_id));
INSERT INTO #matching_rule(matching_rule_id,tag_id)VALUES
(30,1),(30,2),(30,3),(31,1);
DECLARE @matching_rule_id INT=31;
;WITH required_tags AS (
SELECT tag_id
FROM #matching_rule
WHERE matching_rule_id=@matching_rule_id
)
SELECT lt.log_id
FROM required_tags AS rt
INNER JOIN #log_tag AS lt ON
lt.tag_id=rt.tag_id
GROUP BY lt.log_id
HAVING COUNT(*)=(SELECT COUNT(*) FROM required_tags);
DROP TABLE #log_tag;
DROP TABLE #matching_rule;
DROP TABLE #tag;
结果是您的预期结果中的结果为30& 31。
脚本中使用的索引的执行计划:
答案 1 :(得分:1)
尝试此查询
ThreadGroup
YourSampler
Regular Expression Extractor (match -1, any template)
Foreach controller
Counter(Maximum -> ${Result_matchNr} , Rf Name -> index)
LinkSamplerUsingParsedData(use -> ${__V(Result_${index}_g1)}
答案 2 :(得分:1)
注意:这仅适用于SQL Server 2008 +
这是我提出的查询:
DECLARE @RuleID INT
SELECT @RuleID = 30
SELECT LogID
FROM LogTag lt
INNER JOIN (
SELECT TagID, MatchingRuleID, COUNT(*) OVER (PARTITION BY MatchingRuleID) TagCount
FROM MatchingRule
) mr
ON lt.TagID = mr.TagID
AND mr.MatchingRuleID = @RuleID
GROUP BY LogID, TagCount
HAVING COUNT(*) = TagCount
所以基本上我匹配指定匹配规则中的所有TagID
,然后一旦我知道所有标签都匹配,我就检查来看MatchingRule
表中的标签数量是否匹配来自LogTag
表的标签(现已过滤和分组)。
答案 3 :(得分:1)
应该是
; with rules as
(
select TagID, cnt = sum(count(*)) over()
from dbo.MatchingRule
where MatchingRuleID = @MatchingRuleID
group by TagID
)
select LogID
from rules r
inner join LogTag lt on r.TagID = lt.TagID
group by LogID, cnt
having count(*) = r.cnt
答案 4 :(得分:0)
select l.LogID
from dbo.MatchingRule r
inner join dbo.LogTag l on l.TagID = r.TagID
where r.MatchingRuleID = 31
另一种方法是识别所有标签,然后:
select l.LogID
from dbo.LogTag l
where exists(select 1 from @Tags t where t.TagID = l.TagID)