我在SQL Server 2014中有三个表,每个表都有数百万个数据并且不断增长。我试图找出表之间的差异,例如:
DECLARE @ab TABLE
(
k1 int,
k2 int,
val char(1)
)
DECLARE @cd TABLE
(
k1 int,
k2 int,
val char(1),
add_cd varchar(50)
)
DECLARE @ef TABLE
(
k1 int,
k2 int,
val char(1),
add_ef varchar(50)
)
INSERT INTO @ab VALUES(1,1,'a'), (2, 2, 'c'), (3, 3, 'c'), (4, 4, 'd'), (5, 5, NULL), (7, 7, 'g')
INSERT INTO @cd VALUES(1,1,'a', 'DSFS'), (2, 2, 'b', 'ASDF'), (4, 4, NULL, 'SDFE')
INSERT INTO @ef VALUES(1,1,'a', 'SD1245'), (2, 2, 'b', 'EW3464'), (3, 3, 'd', 'DF3452'),(4, 4, 'd', 'FG4576'), (6, 6, 'e', 'RT3453')
所有三个集合的公共密钥列是k1和k2,我想只拉出差异,“val”的值应该不同,或者密钥组合不应该存在于所有三个集合中。无需比较最终结果中所需的其他列(add_cd和add_ef)。期望的结果是:
k1 K2 val k1 k2 val add_cd k1 k2 val add_ef
2 2 c 2 2 b ASDF 2 2 b EW3464
3 3 c NULL NULL NULL NULL 3 3 d DF3452
4 4 d 4 4 NULL SDFE 4 4 d FG4576
5 5 NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL 6 6 e RT3453
7 7 g NULL NULL NULL NULL NULL NULL NULL NULL
我尝试了下面的查询,它提供了所需的结果,但只能用数千而不是数百万。为关键列创建了索引但是我发现它使用了表扫描。有人可以就此提出建议吗?
SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c ON a.k1 = c.k1
AND a.k2 = c.k2
FULL OUTER JOIN @ef e ON (c.k1 = e.k1
AND c.k2 = e.k2 )
OR (a.k1 = e.k1
AND a.k2 = e.k2 )
WHERE (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
OR (ISNULL(a.val, '') != ISNULL(c.val, ''))
OR (ISNULL(c.val, '') != ISNULL(e.val, ''))
OR (ISNULL(a.val, '') != ISNULL(e.val, ''))
答案 0 :(得分:2)
您现有的查询是正确的方法。您可以进行一些小改动来改进它。每个表的索引应位于k2
,val
,SELECT a.*, c.*, e.*
FROM @ab a
FULL OUTER JOIN @cd c ON a.k1 = c.k1
AND a.k2 = c.k2
FULL OUTER JOIN @ef e ON (c.k1 = e.k1
AND c.k2 = e.k2 )
--OR (a.k1 = e.k1 --This condition is not needed and will only slow performance
--AND a.k2 = e.k2 )
WHERE (a.k1 IS NULL OR c.k1 IS NULL OR e.k1 IS NULL)
--OR (ISNULL(a.val, '') != ISNULL(c.val, '')) --Wrapping the val columns in ISNULL prevents the indexes from being used
--OR (ISNULL(c.val, '') != ISNULL(e.val, ''))
--OR (ISNULL(a.val, '') != ISNULL(e.val, ''))
OR ((a.val != c.val) OR (a.val IS NULL AND c.val IS NOT NULL) OR (a.val IS NOT NULL AND c.val IS NULL))
OR ((a.val != e.val) OR (a.val IS NULL AND e.val IS NOT NULL) OR (a.val IS NOT NULL AND e.val IS NULL))
OR ((e.val != c.val) OR (e.val IS NULL AND c.val IS NOT NULL) OR (e.val IS NOT NULL AND c.val IS NULL))
上:
编辑(我的原始NULL处理不正确。正确的方法似乎是冗长的,但可能是逻辑上正确的最有效的解决方案):
app.get(("/employee/:id"), (req, res) => {
data.getEmployeeByNum(req.params.id).then((data) => {
res.render("employee", {employee: data});
}).catch(function(reason) {
res.render("employee", {message:"no results"});
});
});
当您需要比较可空列时,比较ISNULL()结果可能会感觉更优雅,但内联函数会阻止查询引擎使用索引,强制执行表扫描,这是您可以为性能做的最糟糕的事情。
答案 1 :(得分:0)
这样的事情对你有用吗?
SELECT Z.k1, Z.k2, Z.val, Y.k1, Y.k2, Y.val, Y.add_cd, X.k1, X.k2, X.val, X.add_ef
FROM @ab AS Z
FULL OUTER JOIN @cd AS Y ON Z.k1 = Y.k1 AND Z.k2 = Y.k2
FULL OUTER JOIN @ef AS X ON X.k1 = Y.k1 AND X.k2 = Y.k2
WHERE NOT EXISTS (
SELECT A.k1, A.k2, A.val, C.k1, C.k2, C.val, C.add_cd, E.k1, E.k2, E.val, E.add_ef
FROM @ab AS A
INNER JOIN @cd AS C ON A.k1 = C.k1 AND A.k2 = C.k2 AND A.val = C.val
INNER JOIN @ef AS E ON C.k1 = E.k1 AND C.k2 = E.k2 AND C.val = E.val
WHERE Z.k1 = A.k1 AND Z.k2 = A.k2 AND Y.k1 = C.k1 AND Y.k2 = C.k2 AND X.k1 = E.k1 AND X.k2 = E.k2
)
我担心您的NULLS可能存在细微差别,这些细微差别与您希望它们的比较有所不同......
答案 2 :(得分:0)
我认为你正在使用full outer join
走正确的道路,只需要让where子句适用于ya。可能不是最有效的答案,但会做到这一点。
select *
from @ab as ab
full outer join @cd as cd on ab.k1 = cd.k1
and ab.k2 = cd.k2
full outer join @ef as ef on ab.k1 = ef.k1
and ab.k2 = ef.k2
where (
isnull(ab.val, 'X') <> isnull(cd.val, 'XX')
or
isnull(ab.val, 'X') <> isnull(ef.val, 'XX')
or
isnull(cd.val, 'X') <> isnull(ef.val, 'XX')
or
coalesce(ab.val, cd.val, ef.val) is NULL
)
order by coalesce(ab.k1, cd.k1, ef.k1)
, coalesce(ab.k2, cd.k2, ef.k2)
括号围绕整个where
子句,以防您添加另一个约束(不希望编译器因语法而混淆and
/ or
)。 order by
子句仅用于帮助匹配问题中显示的预期输出的顺序。