我试图从名单和电子邮件地址列表中删除重复项。我的脚本的最后一个声明是一个更新,它需要的时间比它应该的长 - 我从未真正等待它完成。如果我放置PRINT'任何东西&#39 ;;直接在它前面的声明,它立即返回。有或没有WHILE()循环的情况相同。
这是一个简化版本,我希望能够说明这个问题。我实际上正在制作一个表值函数,所以我不能将PRINT留在那里。 PRINT可能会产生什么影响?
SQL Server 10.50.4033(2008 R2)
DECLARE @duplicate_names TABLE (
dnDuplicateKey int,
dnPrimaryKey int,
PRIMARY KEY (
dnPrimaryKey,
dnDuplicateKey
)
);
DECLARE @matches TABLE (
mFirstKey int,
mSecondKey int,
PRIMARY KEY (
mFirstKey,
mSecondKey
)
);
--Find Email matches
INSERT INTO @matches
SELECT DISTINCT
f.elKey,
s.elKey
FROM
Emails f INNER JOIN
Emails s
ON f.elEMail = s.elEMail;
--Find name matches
INSERT INTO @matches
SELECT
f.NameKey,
s.NameKey
FROM
Names f INNER JOIN
Names s
ON f.Name = s.Name
WHERE
NOT EXISTS (
SELECT
*
FROM
@Matches
WHERE
mFirstKey = f.NameKey
AND mSecondKey = s.NameKey
)
--Condense duplicate matches
-- 1 = 2,
-- 2 = 1,
-- 3 = 4,
-- 4 = 3
--to
-- 1 = 2,
-- 3 = 4
INSERT INTO @duplicate_names
SELECT
mSecondKey,
MIN(mFirstKey)
FROM
@matches
GROUP BY
mSecondKey;
--Condense chained matches
-- 1 = 2,
-- 2 = 3,
-- 3 = 4
--to
-- 1 = 2,
-- 1 = 3,
-- 1 = 4
WHILE(@@ROWCOUNT > 0)
UPDATE
d
SET
d.dnPrimaryKey = f.dnPrimaryKey
FROM
@duplicate_names d INNER JOIN (
@duplicate_names f INNER JOIN
@duplicate_names s
ON f.dnDuplicateKey = s.dnPrimaryKey
) ON d.dnDuplicateKey = s.dnDuplicateKey
WHERE
d.dnPrimaryKey <> f.dnPrimaryKey;
答案 0 :(得分:0)
所以是的,这只是表变量和临时表之间的性能差异。对于感兴趣的人,这是我如何在不使用临时表的情况下解决它。它仍然比第一种方法慢,但由于我不能在表值函数中使用临时表,所以它是我看到的唯一选项。
DECLARE @duplicate_names TABLE (
dnDuplicateKey int,
dnPrimaryKey int,
PRIMARY KEY (
dnPrimaryKey,
dnDuplicateKey
)
);
DECLARE @matches TABLE (
mFirstKey int,
mSecondKey int,
PRIMARY KEY (
mFirstKey,
mSecondKey
)
);
--Find Email matches
INSERT INTO @matches
SELECT DISTINCT
f.elKey,
s.elKey
FROM
Emails f INNER JOIN
Emails s
ON f.elEMail = s.elEMail;
--Find name matches
INSERT INTO @matches
SELECT
f.NameKey,
s.NameKey
FROM
Names f INNER JOIN
Names s
ON f.Name = s.Name
WHERE
NOT EXISTS (
SELECT
*
FROM
@matches
WHERE
mFirstKey = f.NameKey
AND mSecondKey = s.NameKey
);
--Expand orphaned matches (for which no reciprocal version exists).
WHILE(@@ROWCOUNT > 0)
INSERT INTO @matches
SELECT DISTINCT
f.mFirstKey,
s.mSecondKey
FROM
@matches f INNER JOIN
@matches s
ON f.mSecondKey = s.mFirstKey
WHERE
s.mSecondKey <> f.mFirstKey
AND NOT EXISTS (
SELECT
*
FROM
@matches d
WHERE
d.mFirstKey = f.mFirstKey
AND d.mSecondKey = s.mSecondKey
);
--Condense duplicate matches
-- 1 = 2,
-- 2 = 1,
-- 3 = 4,
-- 4 = 3
--to
-- 1 = 2,
-- 3 = 4
INSERT INTO @duplicate_names
SELECT
mSecondKey,
MIN(mFirstKey)
FROM
@matches
GROUP BY
mSecondKey;