我正在尝试找到一种在SQL Server中查找重复项的更好方法。在SSMS结果窗口中显示结果之前,这需要超过20分钟才能运行,只有超过3亿条记录。在坠毁之前又过了22分钟。
然后SSMS在显示16,777,216条记录后抛出此错误:
An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.
架构:
ENCOUNTER_NUM - numeric(22,0)
CONCEPT_CD - varchar(50)
PROVIDER_ID - varchar(50)
START_DATE - datetime
MODIFIER_CD - varchar(100)
INSTANCE_NUM - numeric(18,0)
SELECT
ROW_NUMBER() OVER (ORDER BY f1.[ENCOUNTER_NUM],f1.[CONCEPT_CD],f1.[PROVIDER_ID],f1.[START_DATE],f1.[MODIFIER_CD],f1.[INSTANCE_NUM]),
f1.[ENCOUNTER_NUM],
f1.[CONCEPT_CD],
f1.[PROVIDER_ID],
f1.[START_DATE],
f1.[MODIFIER_CD],
f1.[INSTANCE_NUM]
FROM
[dbo].[I2B2_OBSERVATION_FACT] f1
INNER JOIN [dbo].[I2B2_OBSERVATION_FACT] f2 ON
f1.[ENCOUNTER_NUM] = f2.[ENCOUNTER_NUM]
AND f1.[CONCEPT_CD] = f2.[CONCEPT_CD]
AND f1.[PROVIDER_ID] = f2.[PROVIDER_ID]
AND f1.[START_DATE] = f2.[START_DATE]
AND f1.[MODIFIER_CD] = f2.[MODIFIER_CD]
AND f1.[INSTANCE_NUM] = f2.[INSTANCE_NUM]
答案 0 :(得分:8)
不确定这是多快多少,但值得一试。
SELECT
COUNT(*) AS Dupes,
f1.[ENCOUNTER_NUM],
f1.[CONCEPT_CD],
f1.[PROVIDER_ID],
f1.[START_DATE],
f1.[MODIFIER_CD],
f1.[INSTANCE_NUM]
FROM
[dbo].[I2B2_OBSERVATION_FACT] f1
GROUP BY
f1.[ENCOUNTER_NUM],
f1.[CONCEPT_CD],
f1.[PROVIDER_ID],
f1.[START_DATE],
f1.[MODIFIER_CD],
f1.[INSTANCE_NUM]
HAVING
COUNT(*) > 1