我有四个表,即城市,位置,客户和商店。设计此DB的人已使主键自动递增。结果,DB超时中存在冗余数据。我试图清理数据库,但更新和删除行需要很长时间。表格的样本如下所示:
Table 1: City: ID_city(PK)
| City | ID_City |
|-----------|---------|
| Chennai | 1 |
| Benagluru | 2 |
| Chennai | 3 |
| Delhi | 4 |
| Chennai | 5 |
| Bengaluru | 6 |
Table 2: Location: ID_Location(PK), ID_City(FK)
| Zip | ID_location | ID_City |
|------|--------------------|---------|
| 0001 | 1 | 1 |
| 0011 | 2 | 2 |
| 0002 | 3 | 1 |
| 0021 | 4 | 3 |
| 0003 | 5 | 1 |
| 0012 | 6 | 2 |
| 0001 | 7 (duplicate of 1) | 1 |
Table 3: Customer: Cust_ID(PK), ID_Location(FK)
| Cust_ID | ID_location |
|---------|-------------|
| 1 | 1 |
| 2 | 3 |
| 3 | 5 |
| 4 | 2 |
| 5 | 7 |
Table 4: Shop: Shop_ID(PK), ID_Location(FK)
| Shop_ID | ID_location |
|---------|-------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 6 |
| 4 | 3 |
| 5 | 7 |
预期表:
Table 1: City: ID_city(PK)
| City | ID_City |
|-----------|---------|
| Chennai | 1 |
| Benagluru | 2 |
| Delhi | 4 |
Table 2: Location: ID_Location(PK), ID_City(FK)
| Zip | ID_Location | ID_City |
|------|-------------|---------|
| 0001 | 1 | 1 |
| 0011 | 2 | 2 |
| 0002 | 3 | 1 |
| 0021 | 4 | 1 |
| 0003 | 5 | 1 |
| 0012 | 6 | 2 |
Table 3: Customer: Cust_ID(PK), ID_Location(FK)
| Cust_ID | ID_Location |
|---------|-------------|
| 1 | 1 |
| 2 | 3 |
| 3 | 5 |
| 4 | 2 |
| 5 | 1 |
Table 4: Shop: Shop_ID(PK), ID_Location(FK)
| Shop_ID | ID_Location |
|---------|-------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 6 |
| 4 | 3 |
| 5 | 1 |
如您所见,到处都有重复的记录,它需要3个更新语句(使用连接)和2个删除语句来删除1个重复的城市。 有没有办法减少执行此任务的SQL语句数量?
我写的查询是:
这是删除1个重复的城市,City表中有大约1300个重复项。有没有一种简单的方法可以检查重复项,更新并最终删除?
答案 0 :(得分:1)
您可以根据条件一次更新整个表格。在您的情况下,存在具有重复值的另一行。
-- (1) UPDATE DUPLICATE CITIES ON LOCATION
UPDATE l SET l.Id_City = mstr.Id_City
-- SELECT c.Id_City oldId, mstr.Id_City newId -- Check this for your convenience
FROM [Location] l
INNER JOIN City c ON c.Id_City = l.Id_City
INNER JOIN (
SELECT City, MIN(Id_City) Id_City -- KEEP FIRST ONLY
FROM City
GROUP BY City
HAVING COUNT(1) > 1
) mstr ON mstr.City = c.City
AND mstr.Id_City < Id_City
-- (2) DELETE DUPLICATE CITIES
DELETE c
-- SELECT c.Id_City oldId, mstr.Id_City newId -- Check this for your convenience
FROM City c
INNER JOIN (
SELECT City, MIN(Id_City) Id_City -- KEEP FIRST ONLY
FROM City
GROUP BY City
HAVING COUNT(1) > 1
) mstr ON mstr.City = c.City
AND mstr.Id_City < Id_City
-- ...
其余查询可以与这些示例类似
答案 1 :(得分:0)
不完美,但应该可以从这里开始工作
declare @City table (ID_city int primary key, City varchar(10));
insert into @city values
(1, 'Chennai')
, (2, 'Benagluru')
, (3, 'Chennai')
, (4, 'Delhi')
, (5, 'Chennai')
, (6, 'Benagluru');
--select * from @city c order by c.City, c.ID_city;
declare @Location table (ID_Location int primary key, ID_City int, zip char(4))
insert into @Location values
(1, 1, '0001')
, (2, 2, '0011')
, (3, 1, '0002')
, (4, 3, '0021')
, (5, 1, '0003')
, (6, 2, '0012')
, (7, 1, '0001'); --duplicate
--select * from @Location l order by l.ID_Location;
declare @Customer table (Cust_ID int primary key, ID_Location int)
insert into @Customer values
(1, 1)
, (2, 3)
, (3, 5)
, (4, 2)
, (5, 7);
--select * from @Customer;
declare @Shop table (Shop_ID int primary key, ID_Location int)
insert into @Shop values
(1, 1)
, (2, 2)
, (3, 6)
, (4, 3)
, (5, 7);
--select * from @Shop s order by s.Shop_ID;
declare @LocationMap table (ID_Location int primary key, ID_City int, zip char(4), cnt int, rn int)
insert into @LocationMap
select l.*
, count(*) over (partition by zip) as cnt
, ROW_NUMBER() over (partition by zip order by ID_Location) as rn
from @Location l;
--select * from @LocationMap where cnt > 1 order by zip, rn;
declare @CityMap table (ID_city int primary key, City varchar(10), cnt int, rn int)
insert into @CityMap
select c.*
, count(*) over (partition by City) as cnt
, ROW_NUMBER() over (partition by City order by City, ID_city) as rn
from @City c;
--select * from @CityMap m where m.cnt > 1 order by m.City, m.ID_city;
update c
set c.ID_Location = f.ID_Location
from @Customer c
join @LocationMap m
on m.ID_Location = c.ID_Location
and m.rn > 1
join @LocationMap f
on f.ID_City = m.ID_City
and m.rn = 1;
select c.* from @Customer c order by c.Cust_ID
update s
set s.ID_Location = f.ID_Location
from @shop s
join @LocationMap m
on m.ID_Location = s.ID_Location
and m.rn > 1
join @LocationMap f
on f.zip = m.zip
and m.rn = 1;
select s.* from @shop s order by s.Shop_ID;
--select l.* from @Location l order by l.ID_Location;
update l
set l.ID_City = f.ID_City
from @Location l
join @CityMap m
on m.ID_city = l.ID_City
and m.rn > 1
join @CityMap f
on f.City = m.City
and f.rn = 1;
select l.* from @Location l order by l.ID_Location;
delete l
from @Location l
join @LocationMap m
on l.ID_Location = m.ID_Location
and m.rn > 1;
select * from @Location l order by l.ID_Location;
delete c
from @City c
join @CityMap m
on c.ID_city = m.ID_city
and m.rn > 1;
select * from @City c order by c.ID_city;