基于列的两个表

时间:2016-02-13 01:04:56

标签: sql-server tsql sql-server-2012

我正在尝试在两个表上执行类似于基于列的交叉。 表格是:

  • LogTag:日志可以包含零个或多个标记
  • MatchingRule:匹配规则由一个或多个定义规则的标记
  • 组成

日志可以包含零个或多个匹配的规则。我将传递MatchingRuleID并期望返回符合该规则的所有日志。

预期结果:匹配LogID的结果集。例如。传递MatchingRuleID = 30应该返回LogID 101. MatchingRuleID = 31应该返回LogID 101& 100.

此外,LogTag表可能有数百万行,因此首选高效查询。

问题:如何查找与指定规则定义匹配的所有LogID

enter image description here

架构:

CREATE TABLE dbo.Tag
(
    TagID INT,
    TagName NVARCHAR(50)
)
INSERT INTO dbo.Tag (TagID, TagName)
VALUES (1, 'tag1'), (2, 'tag2'), (3, 'tag3')

CREATE TABLE dbo.LogTag
(
    LogID INT,
    TagID INT
)
INSERT INTO dbo.LogTag (LogID, TagID)
VALUES (100, 1), (101, 1), (101, 2), (101, 3), (101, 4), (102, 2), (102, 3)  

CREATE TABLE dbo.MatchingRule
(
    MatchingRuleID INT,
    TagID INT
)
INSERT INTO dbo.MatchingRule (MatchingRuleID, TagID)
VALUES (30, 1), (30, 2), (30, 3), (31, 1)

5 个答案:

答案 0 :(得分:2)

在表上使用正确的聚簇索引很重要。我在#log_tag的注释中添加了一个替代索引,这可能会提高大型集的性能。由于我没有适当的样本进行测试,因此您必须验证哪个是最佳的。

CREATE TABLE #tag(tag_id INT PRIMARY KEY,tag_name NVARCHAR(50));
INSERT INTO #tag (tag_id,tag_name)VALUES
    (1,'tag1'),(2,'tag2'),(3,'tag3');

-- Try this key for large sets: PRIMARY KEY(tag_id,log_id));
CREATE TABLE #log_tag(log_id INT,tag_id INT,PRIMARY KEY(log_id,tag_id))
INSERT INTO #log_tag (log_id,tag_id)VALUES
    (100,1),(101,1),(101,2),(101,3),(101,4),(102,2),(102,3);

CREATE TABLE #matching_rule(matching_rule_id INT,tag_id INT,PRIMARY KEY(matching_rule_id,tag_id));
INSERT INTO #matching_rule(matching_rule_id,tag_id)VALUES
    (30,1),(30,2),(30,3),(31,1);

DECLARE @matching_rule_id INT=31;

;WITH required_tags AS (
    SELECT tag_id
    FROM #matching_rule
    WHERE matching_rule_id=@matching_rule_id
)
SELECT lt.log_id
FROM required_tags AS rt 
     INNER JOIN #log_tag AS lt ON
         lt.tag_id=rt.tag_id
GROUP BY lt.log_id
HAVING COUNT(*)=(SELECT COUNT(*) FROM required_tags);

DROP TABLE #log_tag;
DROP TABLE #matching_rule;
DROP TABLE #tag;

结果是您的预期结果中的结果为30& 31。

脚本中使用的索引的执行计划:

Execution plan for index used in script

答案 1 :(得分:1)

尝试此查询

Fiddle Here

ThreadGroup
    YourSampler
        Regular Expression Extractor (match -1, any template)
    Foreach controller
        Counter(Maximum -> ${Result_matchNr} , Rf Name -> index)
        LinkSamplerUsingParsedData(use -> ${__V(Result_${index}_g1)}

答案 2 :(得分:1)

注意:这仅适用于SQL Server 2008 +

这是我提出的查询:

DECLARE @RuleID INT
SELECT @RuleID = 30

SELECT LogID
FROM LogTag lt
    INNER JOIN (
        SELECT TagID, MatchingRuleID, COUNT(*) OVER (PARTITION BY MatchingRuleID) TagCount
        FROM MatchingRule
    ) mr 
    ON lt.TagID = mr.TagID
        AND mr.MatchingRuleID = @RuleID
GROUP BY LogID, TagCount
HAVING COUNT(*) = TagCount

所以基本上我匹配指定匹配规则中的所有TagID,然后一旦我知道所有标签都匹配,我就检查来看MatchingRule表中的标签数量是否匹配来自LogTag表的标签(现已过滤和分组)。

答案 3 :(得分:1)

应该是

; with rules as
(
    select  TagID, cnt = sum(count(*)) over()
    from    dbo.MatchingRule
    where   MatchingRuleID  = @MatchingRuleID
    group by TagID
)
select  LogID
from    rules r
    inner join LogTag lt    on  r.TagID = lt.TagID
group by LogID, cnt
having  count(*) = r.cnt

答案 4 :(得分:0)

select l.LogID
from dbo.MatchingRule r
inner join dbo.LogTag l on l.TagID = r.TagID
where r.MatchingRuleID = 31

另一种方法是识别所有标签,然后:

select l.LogID
from dbo.LogTag l
where exists(select 1 from @Tags t where t.TagID = l.TagID)