我有一个包含3个表的数据库:
---- Contracts --------------------------------------
[PK]ContractID
[FK]DebtorID
Other (like DateStart, DateEnd, ContractStatus, etc.)
-----------------------------------------------------
---- Debtors ----------------------------------------
[PK]DebtorID
[FK]ContactID
DebNr
-----------------------------------------------------
---- Contacts ---------------------------------------
[PK]ContactID
ContactType (0 = person, 1 = company)
ContactNote
Name
-----------------------------------------------------
这是一个非常简单的设计。我使用旧数据库并将数据迁移到这个新结构中。唯一的问题是,它需要清理。具有相同名称的债务人不止一次出现,例如:
ContractID: 1 DebtorID: 1 ContactID: 1 DebtorName: Philips
ContractID: 8 DebtorID: 3 ContactID: 9 DebtorName: Philips
显然,这两个债务人是相同的,因此我使用SSIS en T-SQL模糊分组并将相同的ContactID更新为债务人。所以新的数据示例如下所示:
ContractID: 1 DebtorID: 1 ContactID: 1 DebtorName: Philips
ContractID: 8 DebtorID: 3 ContactID: 1 DebtorName: Philips
所以'飞利浦'只在数据库中出现一次,但仍然有两个DebtorID引用相同的' ContactID',这是不理想的。现在我想更新Contracts表,以便它引用相同的DebtorID,因此我可以删除这些倍数。所以我基本上想要实现的是:
ContractID: 1 DebtorID: 1 ContactID: 1 DebtorName: Philips
ContractID: 8 DebtorID: 1 ContactID: 1 DebtorName: Philips
我写了一个T-SQL来实现这一目标,具体如下:
DECLARE @MINID INT
DECLARE @MAXID INT
DECLARE @DEBTORID INT
DECLARE @CONTACTID INT
/* ENTER ALL THE CONTACTID's INTO #TEMP1 WHICH OCCUR MORE THAN ONCE AND ADD A ROWNUMBER TO IT SO WE CAN GO DOWN THE LIST 1 BY 1*/
SELECT Row_number()
OVER (
ORDER BY ContactID) AS RNUM,
ContactID AS ContactID,
COUNT(*) AS AmountDuplicates
INTO #TEMP1
FROM Debtors
GROUP BY ContactID
HAVING COUNT(*) > 1
SET @MINID = (SELECT(MIN(RNUM)) FROM #TEMP1)
SET @MAXID = (SELECT(MAX(RNUM)) FROM #TEMP1)
WHILE @MINID <= @MAXID
BEGIN
/* SELECT THE CONTACTID OF THE ITERATION */
SELECT @CONTACTID = ContactID
FROM #TEMP1
WHERE RNUM = @MINID
/* SELECT THE LOWEST DEBTORID WHERE THE CONTACTID OCCURS MORE THAN ONCE */
SELECT TOP(1) @DEBTORID = DebtorID
FROM Debtors
WHERE ContactID = @CONTACTID
ORDER BY DebtorID
/* UPDATE ALL CONTACTS WITH THIS LOWEST DEBTORID WHERE THE CONTACTID OCCURS MORE THAN ONCE */
UPDATE C
SET DebtorID = @DEBTORID
FROM Contracts C
INNER JOIN Debtors D ON C.DebtorID = D.DebtorID
WHERE D.ContactID = @CONTACTID
/* NEXT RNUM FROM #TEMP1 ITERATION */
SET @MINID = @MINID + 1
END
DROP TABLE #TEMP1
最后,我删除了不再参考合约表的债务人。
DELETE
FROM Debtors
WHERE DebtorID NOT IN (SELECT DebtorID FROM Contracts)
我可以确认这是完成这项工作,但出于好奇,也许有更简单的方法 - 减少操作,减少绕道 - 实现同样的目标?我在MS SQL Server 2008 R2中工作。