如何在大型varchar字段和NOT EXISTS上使用JOIN加速SQL查询

时间:2015-12-29 15:40:20

标签: sql sql-server performance tsql

我有这个需要永远运行的查询。该表包含大约700万行。我正在做的其他事情(它是一个“临时”的永久表)相对较快(一个小时左右),而这一个UPDATE属于7个小时!我们有SQL Server 2014。

DOI是一个NVARCHAR(72),并且上面有一个非唯一的CLUSTERED索引。 AffiliationsVARCHAR(8000)。我真的不允许更改这些数据类型。 Affiliations有一个索引作为include。由于该领域如此之大,我们无法做出“常规”指数。

CREATE NONCLUSTERED INDEX IX_Affiliations 
    ON TempSourceTable (DOI) INCLUDE (Affiliations);

如果DOI的所有记录在Affiliations列中具有相同的值,则以下语句的作用是将位字段设置为1。此表每DOI个值有多个记录,我们想知道Affiliations列对于具有相同DOI的所有记录是否相同。

有没有什么方法可以加快速度,写一个不同的查询,一个不同的索引,或者我是否会错过这个?

UPDATE S
SET AffiliationsSameForAllDOI = 1
FROM TempSourceTable S
WHERE NOT EXISTS (SELECT 1 
                  FROM TempSourceTable S2 
                  WHERE S2.DOI = S.DOI 
                    AND S2.Affiliations <> S.Affiliations)

3 个答案:

答案 0 :(得分:6)

这是另一种方式

SUB-QUERY

UPDATE TempSourceTable
SET    AffiliationsSameForAllDOI = 1
WHERE  doi IN (SELECT doi
               FROM   TempSourceTable S
               GROUP  BY DOI
               HAVING COUNT(DISTINCT Affiliations) = 1) 

EXISTS版本

UPDATE TempSourceTable S
SET    AffiliationsSameForAllDOI = 1
WHERE EXISTS  (SELECT 1
               FROM   TempSourceTable S1
               Where s1.DOI = s.DOI
               HAVING COUNT(DISTINCT Affiliations) = 1) 

INNER JOIN版本

UPDATE S 
SET    AffiliationsSameForAllDOI = 1
FROM TempSourceTable S
INNER JOIN (SELECT doi
            FROM   TempSourceTable 
            GROUP  BY DOI
            HAVING COUNT(DISTINCT Affiliations) = 1) S1
        ON S.DOI = S1.DOI

答案 1 :(得分:4)

update TempSourceTable
set AffiliationsSameForAllDOI = 1
where DOI in (
    select DOI
    from TempSourceTable
    group by DOI
    having count(distinct Affiliations) = 1
)

根据您的数据的样子,您可以通过创建一个计算列来消除性能,这可以说明Affiliations中的前16个字符或仅使用checksum()然后编制索引而是在那一列上。也许它看起来像这样:

update TempSourceTable
set AffiliationsSameForAllDOI = 1
where DOI in (
    select DOI
    from TempSourceTable
    where DOI in (
        select DOI
        from TempSourceTable
        group by DOI
        having count(distinct AffiliationsChecksum) = 1
    )
    group by DOI
    having count(distinct Affiliations) = 1
)

答案 2 :(得分:0)

我希望这比其他产品表现更好,因为它应该在索引上的单次扫描中执行。此外,最小/最大'技巧'避免了必须收集和维护每个不同的值。

WITH X AS 
(
  SELECT *,
    AffiliationsSameForAllDOI_New =
      CASE WHEN MAX(Affiliations) OVER (PARTITION BY DOI)
        = MIN(Affiliations) OVER (PARTITION BY DOI)
      THEN 1
      ELSE 0
      END
  FROM TempSourceTable 
)
UPDATE X
SET AffiliationsSameForAllDOI = AffiliationsSameForAllDOI_New
WHERE AffiliationsSameForAllDOI_New = 1