我有一个TSQL语句需要几个小时才能运行。我确定我需要查看导入过程以避免重复插入,但暂时我只想删除除重复值之外的所有记录。 ParameterValueId是表上的主键,但我有许多需要删除的重复条目。我只需要为每个ParameterId,SiteId,MeasurementDateTime和ParameterValue创建一条记录。以下是我目前删除重复记录的方法。它找到所有具有计数>的值。 1.然后,它会找到包含这些值的第一个Id,并删除所有记录,这些记录的值与这些值找到的第一个ID不匹配。除了print语句之外,还有一种更有效的方法。我可以用光标来改善性能吗?
BEGIN TRANSACTION
SET NOCOUNT ON
DECLARE @BeginningRecordCount INT
SET @BeginningRecordCount =
(
SELECT COUNT(*)
FROM ParameterValues
)
DECLARE @ParameterId UNIQUEIDENTIFIER
DECLARE @SiteId UNIQUEIDENTIFIER
DECLARE @MeasurementDateTime DATETIME
DECLARE @ParameterValue FLOAT
DECLARE CDuplicateValues CURSOR FOR
SELECT
[ParameterId]
,[SiteId]
,[MeasurementDateTime]
,[ParameterValue]
FROM [ParameterValues]
GROUP BY
[ParameterId]
,[SiteId]
,[MeasurementDateTime]
,[ParameterValue]
HAVING COUNT(*) > 1
OPEN CDuplicateValues
FETCH NEXT FROM CDuplicateValues INTO
@ParameterId
,@SiteId
,@MeasurementDateTime
,@ParameterValue
DECLARE @FirstParameterValueId UNIQUEIDENTIFIER
DECLARE @DuplicateRecordsDeleting INT
WHILE @@FETCH_STATUS <> -1
BEGIN
SET @FirstParameterValueId =
(
SELECT TOP 1 ParameterValueId
FROM ParameterValues
WHERE
ParameterId = @ParameterId
AND SiteId = @SiteId
AND MeasurementDateTime = @MeasurementDateTime
AND ParameterValue = @ParameterValue
)
SET @DuplicateRecordsDeleting =
(
SELECT COUNT(*)
FROM ParameterValues
WHERE
ParameterId = @ParameterId
AND SiteId = @SiteId
AND MeasurementDateTime = @MeasurementDateTime
AND ParameterValue = @ParameterValue
AND ParameterValueId <> @FirstParameterValueId
)
PRINT 'DELETING ' + CAST(@DuplicateRecordsDeleting AS NVARCHAR(50))
+ ' records with values ParameterId : ' + CAST(@ParameterId AS NVARCHAR(50))
+ ', SiteId : ' + CAST (@SiteId AS NVARCHAR(50))
+ ', MeasurementDateTime : ' + CAST(@MeasurementDateTime AS NVARCHAR(50))
+ ', ParameterValue : ' + CAST(@ParameterValue AS NVARCHAR(50))
DELETE FROM ParameterValues
WHERE
ParameterId = @ParameterId
AND SiteId = @SiteId
AND MeasurementDateTime = @MeasurementDateTime
AND ParameterValue = @ParameterValue
AND ParameterValueId <> @FirstParameterValueId
FETCH NEXT FROM CDuplicateValues INTO
@ParameterId
,@SiteId
,@MeasurementDateTime
,@ParameterValue
END
CLOSE CDuplicateValues
DEALLOCATE CDuplicateValues
DECLARE @EndingRecordCount INT
SET @EndingRecordCount =
(
SELECT COUNT(*)
FROM ParameterValues
)
PRINT 'Beginning Record Count : ' + CAST(@BeginningRecordCount AS NVARCHAR(50))
PRINT 'Ending Record Count : ' + CAST(@EndingRecordCount AS NVARCHAR(50))
PRINT 'Total Records Deleted : ' + CAST((@BeginningRecordCount - @EndingRecordCount) AS NVARCHAR(50))
SET NOCOUNT OFF
PRINT 'RUN THE COMMIT OR ROLLBACK STATEMENT AFTER VERIFYING DATA.'
--COMMIT
--ROLLBACK
答案 0 :(得分:1)
你可以在一个sql中完成:
DELETE p FROM ParameterValues p
LEFT JOIN
(SELECT ParameterId, SiteId, MeasurementDateTime, ParameterValue, MAX(ParameterValueId) AS MAX_PARAM
FROM ParameterValues
GROUP BY ParameterId, SiteId, MeasurementDateTime, ParameterValue
) m
ON m.ParameterId = p.ParameterId
AND m.SiteId = p.SiteId
AND m.MeasurementDateTime = p.MeasurementDateTime
AND m.ParameterValue = p.ParameterValue
AND m.MAX_PARAM = p.ParameterValueId
WHERE m.ParameterId IS NULL
当然它不会打印输出,但你仍然可以在
之前和之后打印行答案 1 :(得分:1)
使用带CTE和OVER子句的选项。 OUTPUT .. INTO子句将受DELETE语句影响的行中的信息保存到@delParameterValues表中。此外,在过程体中,您可以使用此表来打印受影响的行。
DECLARE @delParameterValues TABLE
(
ParameterId UNIQUEIDENTIFIER,
SiteId UNIQUEIDENTIFIER,
MeasurementDateTime DATETIME,
ParameterValue FLOAT,
DeletedRecordCount int
)
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY [ParameterId],[SiteId],[MeasurementDateTime],[ParameterValue] ORDER BY 1/0) AS rn,
COUNT(*) OVER (PARTITION BY [ParameterId],[SiteId],[MeasurementDateTime],[ParameterValue]) AS cnt
FROM [ParameterValues]
)
DELETE cte
OUTPUT DELETED.[ParameterId],
DELETED.[SiteId],
DELETED.[MeasurementDateTime],
DELETED.[ParameterValue],
DELETED.cnt INTO @delParameterValues
WHERE rn != 1
SELECT DISTINCT *
FROM @delParameterValues
SQLFiddle上的演示