如何删除重复项,并更新引用SQL中这些重复项的记录

时间:2017-01-26 22:06:32

标签: sql h2

我有两张桌子:

User:(int id, varchar unique username)

Items: (int id, varchar name, int user_id)

目前,用户表中存在不区分大小写的重复项,如:

1,John
2,john
3,sally
4,saLlY

然后Items表将有

1,myitem,1
2,mynewitem,2
3,my-item,3
4,mynew-item,4

我已更新插入用户表的代码,以确保它始终插入小写。

但是,我需要迁移数据库,以便从用户表中删除重复项,并更新项目表引用,以便用户不会失去对其项目的访问权

I.E迁移后的数据将是:

用户:

1,john
3,sally

1,myitem,1
2,mynewitem,1
3,my-item,3
4,mynew-item,3

由于用户表具有唯一约束,因此我无法将其设置为低于

update public.user set username =lower(username)

7 个答案:

答案 0 :(得分:2)

以下代码使用" H2 1.3.176(2014-04-05)/嵌入模式"进行测试在Web控制台上。如您所述,有两个问题可以解决问题,并且还有一个额外的准备声明用于考虑案例 - 尽管未在数据中显示 - 也应该考虑。稍后将解释准备声明;让我们从主要的两个查询开始:

首先,所有items.userid将被重写为具有小写名称的相应用户条目的那些,如下所示:让我们调用小写条目main和非小写条目{{1 }}。然后,引用dup的每个items.userid都将设置为相应的dup.id。如果对名称的不区分大小写的比较匹配,则主条目对应于dup条目,即main.id

其次,将删除用户表中的所有重复条目。 dup条目是main.name = lower(dup.name)

的条目

目前基本要求。另外,我们应该考虑对于某些用户,可能只存在具有大写字符的条目,但没有"小写条目"。为了处理这种情况,使用了一个准备语句,它为每组公共名称设置了每个组中一个小写的名称。

name <> lower(name)

执行以上语句会产生以下结果:

drop table if exists usr;

CREATE TABLE usr
    (`id` int primary key, `name` varchar(5))
;

INSERT INTO usr
    (`id`, `name`)
VALUES
    (1, 'John'),
    (2, 'john'),
    (3, 'sally'),
    (4, 'saLlY'),
    (5, 'Mary'),
    (6, 'mAry')

;

drop table if exists items;

CREATE TABLE items
    (`id` int, `name` varchar(10), `userid` int references usr (`id`))
;

INSERT INTO items
    (`id`, `name`, `userid`)
VALUES
    (1, 'myitem', 1),
    (2, 'mynewitem', 2),
    (3, 'my-item', 3),
    (4, 'mynew-item', 4)
;

update usr set name = lower(name) where id in (select min(ui.id) as minid from usr ui where lower(ui.name) not in (select ui2.name from usr ui2)
group by lower(name));

update items set userid =
(select umain.id as mainid from usr udupl, usr umain
 where umain.name = lower(umain.name)
     and lower(udupl.name) = lower(umain.name)
     and udupl.id = userid
);

delete from usr where name <> lower(name);

select * from usr;

select * from items;

答案 1 :(得分:2)

如果您首先正确更新了项目引用,则可以删除用户重复项。在下面的示例中,如果不打扰您,我会将具有最小ID的用户保留为正确的

--Prepare data
create TABLE #users  
(id int primary key, username varchar(15));

INSERT INTO #users
(id, username)
select
1, 'John'
union all select
2, 'john'
union all select
3, 'sally'
union all select
4, 'saLlY'
union all select
5, 'Mary'
union all select
6, 'mAry'


create TABLE #items  
(itemid int, name varchar(10), userid int references #users (id));

INSERT INTO #items
(itemid, name, userid)
select
1, 'myitem', 1
union all select
2, 'mynewitem', 2
union all select
3, 'my-item', 3
union all select
4, 'mynew-item', 4
;

--Update items
update #items 
set userid =minid 
from
 (
select minid,id from 
(
select min(id) as minid,lower(username) as newusername
from #users group by username) t inner join #users 
on t.newusername = username) t2 inner join #items on t2.id = userid


--delete duplicates users, according to minimum id
delete from #users where id not in (
select min(id) from #users group by lower(username))

--set the remaining users names to lower
update #users
set username = lower(username)

--Clean temp data
drop table #users
drop table #items 

这是在sqlserver中测试的,但是你要求纯sql,所以我觉得它适合你

答案 2 :(得分:1)

首先更新项目:

update items
set userid = u.userid
from items i
   inner join users u on i.iserid=u.userid
   inner join (select userid, username, row_number() over (partition by username order by userid)) u2 on u2.username=u.username and rn=1

然后创建基于原始的新用户表:

select userid, lower(username) username 
into NewUserTable
from (select userid, username, row_number() over (partition by username order by userid)) u 
where rn=1

答案 3 :(得分:1)

此代码在SQL Server上运行良好

尝试它会对您有所帮助(您可能需要进行简单的更改以符合您的数据库引擎): -

SELECT U1.id,U2.id id2
INTO #User_Tmp
FROM User U1 JOIN User U2 
ON LOWER(U2.username) = LOWER(U1.username) 
AND U1.id < U2.id

UPDATE It
SET It.user_id = U.id
FROM Items It
JOIN #User_Tmp U
ON U.id2 = It.id

DELETE FROM User
WHERE id IN 
(
    SELECT id2 FROM #User_Tmp
)

SELECT *
FROM User

SELECT *
FROM Items

DROP TABLE #User_Tmp;

希望这回答这个问题。

答案 4 :(得分:1)

BEGIN TRAN
CREATE TABLe #User (UserID Int, UserName Nvarchar(255))

INSERT INTO #USER
SELECT 1,'John' UNION ALL
SELECT 2,'John'  UNION ALL
SELECT 3,'sally' UNION ALL
SELECT 4,'saLlY'

CREATE TABLE #items  
(itemid int, name varchar(10), userid int );

INSERT INTO #items
(itemid, name, userid)
select
1, 'myitem', 1
union all select
2, 'mynewitem', 2
union all select
3, 'my-item', 3
union all select
4, 'mynew-item', 4

GO
WITH CTE (USERID, DuplicateCount)
AS
(
    SELECT UserName,
    ROW_NUMBER() OVER(PARTITION BY  UserName
    ORDER BY  UserName) AS DuplicateCount
    FROM #User

)
Delete from CTE Where DuplicateCount > 1

Select * from #User

Select * from #items

ROLLBACK TRAN

答案 5 :(得分:1)

使用此功能尝试使用MERGE语句,您可以找到重复项,也可以更新重复项的值。

MERGE [INTO] <target table>

USING <source table or table expression>

ON <join/merge predicate> (semantics similar to outer join)

WHEN MATCHED <statement to run when match found in target>

WHEN [TARGET] NOT MATCHED <statement to run when no match found in target>

答案 6 :(得分:1)

我不擅长H2。您可以尝试此写入SQL Server和数据库区分大小写,重音敏感。

create table t_user(id int not null identity(1,1), username varchar(25) unique);
alter table t_user add constraint pk_id_user primary key(id);

create table t_items(id int not null identity(1,1), name varchar(25), user_id int);
alter table t_items add constraint pk_id_items primary key(id);
alter table t_items add constraint fk_user_id foreign key(user_id) references t_user(id);

insert into t_user (username) values ('John'), ('john'), ('sally'), ('saLlY');
insert into t_items (name, user_id) values ('myitem', 1), ('mynewitem', 2), ('my-item', 3), ('mynew-item',4);

select * from t_user
select * from t_items

create table t_user_mig(id int not null identity(1,1), username varchar(25) unique);
alter table t_user_mig add constraint pk_id_user_mig primary key(id);

create table t_items_mig(id int not null identity(1,1), name varchar(25), user_id int);
alter table t_items_mig add constraint pk_id_items_mig primary key(id);
alter table t_items_mig add constraint fk_user_id_mig foreign key(user_id) references t_user_mig(id);

insert into t_user_mig select distinct lower(username) from t_user
insert into t_items_mig
select ti.name, (select id from t_user_mig where username = lower(tu.username)) 
from t_items ti, t_user tu 
where ti.user_id = tu.id

select * from t_user_mig
select * from t_items_mig

我用 t_user,t_items 替换你的表用户,项目。这些表将迁移到 t_user_mig,t_items_mig

你可以在H2尝试。我很感激您的反馈意见。

我希望它可以提供帮助。