我有一个主表,其中包含链接到各种其他表的用户。有时由于错误的导入数据,该主表中存在重复项,我想将它们合并。请参阅下表。
表:用户
UserID Username FirstName LastName
1 Main John Doe
2 Duplicate John Doo
表:记录1
RecordID RecordName CreatedUserID UpdatedUserID
1 Test record 1 1 2
2 Test record 2 2 null
3 Test record 3 2 null
CreatedUserID和UpdatedUserID是Users.UserID的外部列。
因此,当前,如果我要合并用户1和2,可以使用以下SQL语句来完成此操作:
UPDATE Records1 SET UpdatedUserID = 1 WHERE UpdatedUserID = 2
UPDATE Records1 SET CreatedUserID = 1 WHERE CreatedUserID = 2
DELETE FROM Users WHERE UserID = 2
这只是一个示例子集,但实际上,有很多相关记录表,我必须为其添加其他SQL-Update语句。
我知道我可能会在这里碰运气,但是也许有一种方法可以完成上述操作(批量更新所有相关表并删除“重复”记录),而不是更新每个外部字段和每个相关表手动。用户表基本上是链接到所有其他表的基础表,因此为每个表创建单独的语句相当麻烦,因此,如果有快捷方式可用,那就太好了。
答案 0 :(得分:0)
这有帮助吗??
Create Table Users(Id int, UserName varchar(10),FirstName varchar(10), LastName Varchar(10))
Create Table Records1(RecordID int, RecordName varchar(20), CreatedUserID int, UpdatedUserID int)
INSERT INTO Users
SELECT 1,'Main','John','Doe' Union All
SELECT 2,'Duplicate','John','Doo' Union All
SELECT 3,'Main3','ABC','MPN' Union All
SELECT 4,'Duplicate','ABC','MPT'
Insert into Records1
SELECT 1,'Test record 1',1,2 Union All
SELECT 2,'Test record 2',2,null Union All
SELECT 3,'Test record 3',2,null Union All
SELECT 1,'Test record 1',3,4 Union All
SELECT 2,'Test record 2',4,null Union All
SELECT 3,'Test record 3',4,null
Select u1.Id as CreatedUserID,U2.id as UpdatedUserID
Into #tmpUsers
from Users u1
JOIN Users u2
--This Conidition Should be changed based on the criteria for identifying Duplicates
on u1.FirstName=u2.FirstName and U2.UserName='Duplicate'
Where u1.UserName<>'Duplicate'
Update r
Set r.UpdatedUserID=u.CreatedUserID
From Records1 r
JOIN #tmpUsers u on r.CreatedUserID=u.CreatedUserID
Update r
Set r.CreatedUserID=u.CreatedUserID
From Records1 r
JOIN #tmpUsers u on r.CreatedUserID=u.UpdatedUserID
Delete from Users Where UserName='Duplicate'
Select * from Users
Select * from Records1
Drop Table #tmpUsers
答案 1 :(得分:0)
由于识别重复帐户的过程将是手动的,因此(通常)将有成对的帐户要处理。 (我假设Inspector不能在您的UI中剔除15个用户帐户作为重复项,而是将全部批次提交进行处理。)
如下所示的存储过程可能是一个好的开始:
create procedure MergeUsers
@RetainedUserId Int, -- UserId that is being kept.
@VictimUserId Int -- UserId that is to be removed.
as
begin
-- Validate the input.
-- Optional, but you may want some reality checks.
-- (Usernames are probably unique already, eh?)
declare @UsernameMatch as Int, @FirstNameMatch as Int, @LastNameMatch as Int, @EmailMatch as Int;
select
@UsernameMatch = case when R.Username = V.Username then 1 else 0 end,
@FirstNameMatch = case when R.FirstName = V.FirstName then 1 else 0 end,
@LastNameMatch = case when R.LastName = V.LastName then 1 else 0 end,
@EmailMatch = case when R.Email= V.Emailthen 1 else 0 end
from Users as R inner join
Users as V on V.UserId = @VictimUserId and R.UserId = @RetainedUserId;
if @UsernameMatch + @FirstNameMatch + @LastNameMatch + @EmailMatch < 2
begin
-- The following message should be enhanced to provide a better clue as to which user
-- accounts are being processed and what did or didn't match.
RaIsError( 'MergeUsers: The two user accounts should have something in common.', 25, 42 );
return;
end;
-- Update all of the related tables.
-- Using a single pass through each table and updating all of the appropriate columns may improve performance.
-- The case expression will only alter the values which reference the victim user account.
update Records1
set
CreatedUserId = case when CreatedUserId = @VictimId then @RetainedUserId else CreatedUserId end,
UpdatedUserId = case when UpdatedUserId = @VictimId then @RetainedUserId else UpdatedUserId end
where CreatedUserId = @VictimUserId or UpdatedUserId = @VictimUserId;
update Records2
set ...
where ...;
-- Houseclean Users .
delete from Users
where UserId = @VictimUserId;
end;
注意事项:在练习中向左添加try
/ catch
和SP中的事务以确保合并是全有或全无操作。