三个表之间的数据差异/比较 - SQL Server 2014

时间:2018-03-14 18:43:02

标签: sql sql-server-2014

我在SQL Server 2014中有三个表,每个表都有数百万个数据并且不断增长。我试图找出表之间的差异,例如:

DECLARE @ab TABLE
(
    k1 int,
    k2 int,
    val char(1)
)

DECLARE @cd TABLE
(
    k1 int,
    k2 int,
    val char(1),
    add_cd varchar(50)
)

DECLARE @ef TABLE
(
    k1 int,
    k2 int,
    val char(1),
    add_ef varchar(50)
)

INSERT INTO @ab VALUES(1,1,'a'), (2, 2, 'c'), (3, 3, 'c'), (4, 4, 'd'), (5, 5, NULL), (7, 7, 'g')

INSERT INTO @cd VALUES(1,1,'a', 'DSFS'), (2, 2, 'b', 'ASDF'), (4, 4, NULL, 'SDFE')

INSERT INTO @ef VALUES(1,1,'a', 'SD1245'), (2, 2, 'b', 'EW3464'), (3, 3, 'd', 'DF3452'),(4, 4, 'd', 'FG4576'), (6, 6, 'e', 'RT3453')

所有三个集合的公共密钥列是k1和k2,我想只拉出差异,“val”的值应该不同,或者密钥组合不应该存在于所有三个集合中。无需比较最终结果中所需的其他列(add_cd和add_ef)。期望的结果是:

k1   K2   val   k1    k2    val  add_cd  k1   k2    val   add_ef
2    2    c     2     2     b    ASDF    2    2     b     EW3464 
3    3    c     NULL  NULL  NULL NULL    3    3     d     DF3452
4    4    d     4     4     NULL SDFE    4    4     d     FG4576
5    5    NULL  NULL  NULL  NULL NULL    NULL NULL  NULL  NULL
NULL NULL NULL  NULL  NULL  NULL NULL    6    6     e     RT3453
7    7    g     NULL  NULL  NULL NULL    NULL NULL  NULL  NULL

我尝试了下面的查询,它提供了所需的结果,但只能用数千而不是数百万。为关键列创建了索引但是我发现它使用了表扫描。有人可以就此提出建议吗?

SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c   ON  a.k1 = c.k1
                        AND a.k2 = c.k2
FULL OUTER JOIN @ef e   ON  (c.k1 = e.k1
                        AND c.k2 = e.k2 ) 
                        OR (a.k1 = e.k1
                        AND a.k2 = e.k2 )       
WHERE   (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
OR      (ISNULL(a.val, '') != ISNULL(c.val, ''))
OR      (ISNULL(c.val, '') != ISNULL(e.val, ''))
OR      (ISNULL(a.val, '') != ISNULL(e.val, ''))

3 个答案:

答案 0 :(得分:2)

您现有的查询是正确的方法。您可以进行一些小改动来改进它。每个表的索引应位于k2valSELECT a.*, c.*, e.* FROM @ab a FULL OUTER JOIN @cd c ON a.k1 = c.k1 AND a.k2 = c.k2 FULL OUTER JOIN @ef e ON (c.k1 = e.k1 AND c.k2 = e.k2 ) --OR (a.k1 = e.k1 --This condition is not needed and will only slow performance --AND a.k2 = e.k2 ) WHERE (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL) --OR (ISNULL(a.val, '') != ISNULL(c.val, '')) --Wrapping the val columns in ISNULL prevents the indexes from being used --OR (ISNULL(c.val, '') != ISNULL(e.val, '')) --OR (ISNULL(a.val, '') != ISNULL(e.val, '')) OR ((a.val != c.val) OR (a.val IS NULL AND c.val IS NOT NULL) OR (a.val IS NOT NULL AND c.val IS NULL)) OR ((a.val != e.val) OR (a.val IS NULL AND e.val IS NOT NULL) OR (a.val IS NOT NULL AND e.val IS NULL)) OR ((e.val != c.val) OR (e.val IS NULL AND c.val IS NOT NULL) OR (e.val IS NOT NULL AND c.val IS NULL)) 上:

编辑(我的原始NULL处理不正确。正确的方法似乎是冗长的,但可能是逻辑上正确的最有效的解决方案):

app.get(("/employee/:id"), (req, res) => {
    data.getEmployeeByNum(req.params.id).then((data) => {
        res.render("employee", {employee: data});
    }).catch(function(reason) {
        res.render("employee", {message:"no results"});
    });
});

当您需要比较可空列时,比较ISNULL()结果可能会感觉更优雅,但内联函数会阻止查询引擎使用索引,强制执行表扫描,这是您可以为性能做的最糟糕的事情。

答案 1 :(得分:0)

这样的事情对你有用吗?

SELECT Z.k1, Z.k2, Z.val, Y.k1, Y.k2, Y.val, Y.add_cd, X.k1, X.k2, X.val, X.add_ef
FROM @ab AS Z 
FULL OUTER JOIN @cd AS Y ON Z.k1 = Y.k1 AND Z.k2 = Y.k2
FULL OUTER JOIN @ef AS X ON X.k1 = Y.k1 AND X.k2 = Y.k2
WHERE NOT EXISTS (
    SELECT A.k1, A.k2, A.val, C.k1, C.k2, C.val, C.add_cd, E.k1, E.k2, E.val, E.add_ef
    FROM @ab AS A
    INNER JOIN @cd AS C ON A.k1 = C.k1 AND A.k2 = C.k2 AND A.val = C.val
    INNER JOIN @ef AS E ON C.k1 = E.k1 AND C.k2 = E.k2 AND C.val = E.val
    WHERE Z.k1 = A.k1 AND Z.k2 = A.k2 AND Y.k1 = C.k1 AND Y.k2 = C.k2 AND X.k1 = E.k1 AND X.k2 = E.k2
)

我担心您的NULLS可能存在细微差别,这些细微差别与您希望它们的比较有所不同......

答案 2 :(得分:0)

我认为你正在使用full outer join走正确的道路,只需要让where子句适用于ya。可能不是最有效的答案,但会做到这一点。

select *
from @ab as ab
full outer join @cd as cd on ab.k1 = cd.k1
                         and ab.k2 = cd.k2
full outer join @ef as ef on ab.k1 = ef.k1
                         and ab.k2 = ef.k2
where (
        isnull(ab.val, 'X') <> isnull(cd.val, 'XX')
        or
        isnull(ab.val, 'X') <> isnull(ef.val, 'XX')
        or
        isnull(cd.val, 'X') <> isnull(ef.val, 'XX')
        or
        coalesce(ab.val, cd.val, ef.val) is NULL
    )
order by coalesce(ab.k1, cd.k1, ef.k1)
, coalesce(ab.k2, cd.k2, ef.k2)

括号围绕整个where子句,以防您添加另一个约束(不希望编译器因语法而混淆and / or)。 order by子句仅用于帮助匹配问题中显示的预期输出的顺序。