如何有效地进行数据库大规模更新?

时间:2009-04-09 16:12:33

标签: sql sql-update temp-tables

我有一个包含一些重复条目的表。除了一个,我必须丢弃所有,然后更新这个最新的。我用这种方式尝试了一个临时表和一个while语句:

CREATE TABLE #tmp_ImportedData_GenericData
(
    Id int identity(1,1),
    tmpCode varchar(255)  NULL,
    tmpAlpha3Code varchar(50)  NULL,
    tmpRelatedYear int NOT NULL,
    tmpPreviousValue varchar(255)  NULL,
    tmpGrowthRate varchar(255)  NULL
)

INSERT INTO #tmp_ImportedData_GenericData
SELECT
    MCS_ImportedData_GenericData.Code, 
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
    SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
    FROM MCS_ImportedData_GenericData AS M
    GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
    HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
    AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
    AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')

 -- SELECT * from #tmp_ImportedData_GenericData
 -- DROP TABLE #tmp_ImportedData_GenericData

DECLARE @counter int
DECLARE @rowsCount int

SET @counter = 1

SELECT @rowsCount =  count(*) from #tmp_ImportedData_GenericData
-- PRINT @rowsCount

WHILE @counter  < @rowsCount
BEGIN
    SELECT 
        @Code = tmpCode, 
        @Alpha3Code = tmpAlpha3Code, 
        @RelatedYear = tmpRelatedYear, 
        @OldValue = tmpPreviousValue, 
        @GrowthRate = tmpGrowthRate 
    FROM 
        #tmp_ImportedData_GenericData
    WHERE 
        Id = @counter

    DELETE FROM MCS_ImportedData_GenericData 
    WHERE 
        Code = @Code 
        AND Alpha3Code = @Alpha3Code  
        AND RelatedYear = @RelatedYear  
        AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL  

    UPDATE 
        MCS_ImportedData_GenericData 
        SET 
          PreviousValue = @OldValue, GrowthRate = @GrowthRate 
    WHERE 
        Code = @Code 
        AND Alpha3Code = @Alpha3Code  
        AND RelatedYear = @RelatedYear  
        AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'

    SET @counter = @counter + 1
END

但即使只有20000 - 30000行要处理,也需要很长时间。

有人提出一些建议以提高效果吗?

提前致谢!

3 个答案:

答案 0 :(得分:3)

WITH q AS (
        SELECT  m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
        FROM    MCS_ImportedData_GenericData m
        WHERE   PreviousValue <> 'INDEFINITO'
        )
DELETE
FROM    q
WHERE   rn > 1

答案 1 :(得分:1)

Quassnoi的回答使用了SQL Server 2005+语法,所以我认为我会把我的tuppence值得使用更通用的东西......

首先,要删除所有重复项,而不是“原始”重复项,您需要一种区分重复记录的方法。 (Quassnoi答案的ROW_NUMBER()部分)

在您的情况下,源数据似乎没有标识列(您在临时表中创建一个)。如果是这种情况,我会想到两种选择:
1.将标识列添加到数据中,然后删除重复项 2.创建“去除”数据集,删除原始数据中的所有内容,并将去重复数据插入原始数据

选项1可能是...... (使用新创建的ID字段)

DELETE
   [data]
FROM
   MCS_ImportedData_GenericData AS [data]
WHERE
   id > (
         SELECT
            MIN(id)
         FROM
            MCS_ImportedData_GenericData
         WHERE
            CODE = [data].CODE
            AND ALPHA3CODE = [data].ALPHA3CODE
            AND RELATEDYEAR = [data].RELATEDYEAR
        )

... OR

DELETE
   [data]
FROM
   MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
   SELECT
      MIN(id) AS [id],
      CODE,
      ALPHA3CODE,
      RELATEDYEAR
   FROM
      MCS_ImportedData_GenericData
   GROUP BY
      CODE,
      ALPHA3CODE,
      RELATEDYEAR
)
AS [original]
   ON [original].CODE = [data].CODE
   AND [original].ALPHA3CODE = [data].ALPHA3CODE
   AND [original].RELATEDYEAR = [data].RELATEDYEAR
   AND [original].id <> [data].id

答案 2 :(得分:0)

我不完全理解使用过的语法来发布确切的答案,但这是一种方法。

识别要保留的行(例如,选择值,......来自...其中......)

在识别时执行更新逻辑(例如,从......中选择值+ 1 ...)

将select select插入新表。

删除原始内容,将new重命名为original,重新创建所有授权/同义词/触发器/索引/ FKs / ...(或截断原始内容并从新内容中插入select)

显然这有很大的开销,但如果你想更新/清除数百万行,那将是最快的方式。