我有一个包含很多重复项目列表的表格。我正在研究一个存储过程,将它们合并为一个记录。每个重复项都有许多子表,应该删除这些子表,或者重新键入以指向结果记录。我的表有一个Id,但ReadableIdentifier是我需要进行重复数据删除的列。
Id | ReadableIdentifier | Name | UpdatedOn
1 | ABC1234 | Product X | 2014-04-25 16:00:08.000
2 | ABC1234 | Product X | 2014-04-28 16:00:08.000
3 | ABC1234 | Product X | 2014-04-21 16:00:08.000
4 | ABDD9945 | Widget R | 2014-04-25 16:00:08.000
5 | ABDD9945 | Widget R | 2014-04-25 18:45:08.000
如您所见,记录1-3是重复的,具有不同的ID和更新的日期。 4-5相同。我需要将这些合并到一个记录中,而不是使用最新的UpdatedOn日期记录。
结束目标(不显示子表):
Id | ReadableIdentifier | Name | UpdatedOn
2 | ABC1234 | Product X | 2014-04-28 16:00:08.000
5 | ABDD9945 | Widget R | 2014-04-25 18:45:08.000
我正在使用CURSOR
来执行此操作,但我想知道是否有更好的解决方案。
DECLARE dupeCursor CURSOR
FAST_FORWARD
FOR
WITH Counts AS (
SELECT
COUNT(1) Count,
ReadableIdentifier
FROM dbo.Item WITH (NOLOCK)
WHERE ReadableIdentifier IS NOT NULL
GROUP BY ReadableIdentifier)
SELECT
Counts.Count,
Counts.ReadableIdentifier,
Counts.CompanyId
FROM
Counts
WHERE Counts.Count > 1;
OPEN dupeCursor;
DECLARE @readableId VARCHAR(50);
DECLARE @itemToPersistId INT, @itemToDeleteId INT;
FETCH NEXT FROM dupeCursor INTO @readableId;
WHILE @@FETCH_STATUS = 0
BEGIN
WITH V AS (
SELECT Id, ROW_NUMBER() OVER (PARTITION BY ReadableId ORDER BY UpdatedOn DESC) as Row
FROM dbo.Item WITH (NOLOCK) WHERE ReadableId = @readableId
)
SELECT @itemToPersistId = Id
FROM V
WHERE V.Row = 1
CREATE TABLE #itemsToDelete (Id UNIQUEIDENTIFIER)
INSERT INTO #itemsToDelete
SELECT Id
FROM dbo.Item WITH (NOLOCK)
WHERE ReadableId = @readableId AND Id != @itemToPersistId;
--UPDATE CHILDREN TABLES
DELETE FROM dbo.ItemDetails WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
UPDATE dbo.ItemPurchases SET ItemId = @itemToPersistId
WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
UPDATE dbo.PurchaseOrders SET ItemId = @itemToPersistId
WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
DELETE FROM dbo.ItemMetadata WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
--delete Duplicated Items
DELETE FROM dbo.Item WHERE Id IN (SELECT Id FROM #itemsToDelete);
DROP TABLE #itemsToDelete
FETCH NEXT FROM dupeCursor INTO @readableId;
END
CLOSE dupeCursor;
DEALLOCATE dupeCursor;
我意识到光标很可能是问题所在,但我不确定如何在不使用光标图的情况下更新所有子表。
答案 0 :(得分:1)
好的我没有数据来测试这个子表,但它应该工作:
WITH V
AS (SELECT *,
ROW_NUMBER() OVER(PARTITION BY ReadableId ORDER BY UpdatedOn DESC) AS Row
FROM dbo.Item WITH (NOLOCK))
SELECT *
INTO #itemsToDelete
FROM V;
--UPDATE CHILDREN TABLES
DELETE FROM dbo.ItemDetails
WHERE ItemId IN
(
SELECT Id
FROM #itemsToDelete
WHERE Row > 1
);
UPDATE IP
SET
IP.ItemId = itk.ID
FROM dbo.ItemPurchases AS IP
INNER JOIN #itemsToDelete AS itd ON IP.ItemId = itd.ID
AND itd.Row > 1
INNER JOIN #itemsToDelete AS itk ON itk.ReadableIdentifier = itd.ReadableIdentifier
AND itk.Row = 1
AND itd.Row > 1;
UPDATE po
SET
po.ItemId = itk.ID
FROM dbo.PurchaseOrders AS po
INNER JOIN #itemsToDelete AS itd ON po.ItemId = itd.ID
AND itd.Row > 1
INNER JOIN #itemsToDelete AS itk ON itk.ReadableIdentifier = itd.ReadableIdentifier
AND itk.Row = 1
AND itd.Row > 1;
DELETE FROM dbo.ItemMetadata
WHERE ItemId IN
(
SELECT Id
FROM #itemsToDelete
WHERE Row > 1
);
--delete Duplicated Items
DELETE FROM dbo.Item
WHERE Id IN
(
SELECT Id
FROM #itemsToDelete
WHERE Row > 1
);