我有一个包含一些重复条目的表。除了一个,我必须丢弃所有,然后更新这个最新的。我用这种方式尝试了一个临时表和一个while语句:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE @counter int
DECLARE @rowsCount int
SET @counter = 1
SELECT @rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT @rowsCount
WHILE @counter < @rowsCount
BEGIN
SELECT
@Code = tmpCode,
@Alpha3Code = tmpAlpha3Code,
@RelatedYear = tmpRelatedYear,
@OldValue = tmpPreviousValue,
@GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = @counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = @Code
AND Alpha3Code = @Alpha3Code
AND RelatedYear = @RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = @OldValue, GrowthRate = @GrowthRate
WHERE
Code = @Code
AND Alpha3Code = @Alpha3Code
AND RelatedYear = @RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET @counter = @counter + 1
END
但即使只有20000 - 30000行要处理,也需要很长时间。
有人提出一些建议以提高效果吗?
提前致谢!
答案 0 :(得分:3)
WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1
答案 1 :(得分:1)
Quassnoi的回答使用了SQL Server 2005+语法,所以我认为我会把我的tuppence值得使用更通用的东西......
首先,要删除所有重复项,而不是“原始”重复项,您需要一种区分重复记录的方法。 (Quassnoi答案的ROW_NUMBER()部分)
在您的情况下,源数据似乎没有标识列(您在临时表中创建一个)。如果是这种情况,我会想到两种选择:
1.将标识列添加到数据中,然后删除重复项
2.创建“去除”数据集,删除原始数据中的所有内容,并将去重复数据插入原始数据
选项1可能是...... (使用新创建的ID字段)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
... OR
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id
答案 2 :(得分:0)
我不完全理解使用过的语法来发布确切的答案,但这是一种方法。
识别要保留的行(例如,选择值,......来自...其中......)
在识别时执行更新逻辑(例如,从......中选择值+ 1 ...)
将select select插入新表。
删除原始内容,将new重命名为original,重新创建所有授权/同义词/触发器/索引/ FKs / ...(或截断原始内容并从新内容中插入select)
显然这有很大的开销,但如果你想更新/清除数百万行,那将是最快的方式。