我有一个从具有重复记录的XML文档生成的数据库。我知道如何从主表中删除一条记录,但不知道那些具有外键约束的记录。
我有大量的XML文档,插入它们时没有关心重复项。删除重复项的一种解决方案是删除最低的Primary_Key值(以及所有相关的外键记录)并保持最高值。不过,我不知道该怎么做。
数据库如下所示:
表1:[类型]
+-------------+---------+-----------+
| Primary_Key | Food_ID | Food_Type |
+-------------+---------+-----------+
| 70001 | 12345 | fruit |
| 70002 | 12345 | fruit |
| 70003 | 12345 | meat |
+----^--------+---------+-----------+
|
|-----------------|
|
| Linked to primary key in the first table
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key | Information_ID | Food_Name | Information | Comments |
+-------------+-----------------+-------------+-------------+------------+
| 0001 | 70001 | banana | buy @ toms | delicious! |
| 0002 | 70002 | banana | buy @ mats | so-so |
| 0003 | 70003 | decade meat | buy @ sals | disgusting |
+-------------+-----------------+-------------+-------------+------------+
^ 表2:[food_information]
还有其他几个链接表,它们都具有主表([type])中匹配主键值的外键值。
我的问题基于哪种解决方案可能是最好的:
我更喜欢#1,因为它阻止我重建数据库,它使插入更容易。
谢谢!
答案 0 :(得分:0)
类似......
假设:
create table food (
primary_key int,
food_id int,
food_type varchar(20)
);
insert into food values (70001,12345,'fruit');
insert into food values (70002,12345,'fruit');
insert into food values (70003,12345,'meat');
insert into food values (70004,11111,'taco');
create table info (
primary_key int,
info_id int,
food_name varchar(20)
);
insert into info values (1,70001,'banana');
insert into info values (2,70002,'banana');
insert into info values (3,70003,'decade meat');
insert into info values (4,70004,'taco taco');
然后......
-- yields: 12345 70003
select food_id, max(info_id) as max_info_id
from food
join info on food.primary_key=info.info_id
where food_id in (
select food_id
from food
join info on food.primary_key=info.info_id
group by food_id
having count(*)>1);
然后......类似......这样可以删除那些。可能有更好的方法来写这个...我正在考虑它。
select *
from food
join info on food.primary_key=info.info_id
join ( select food_id, max(info_id) as max_info_id
from food
join info on food.primary_key=info.info_id
where food_id in (
select food_id
from food
join info on food.primary_key=info.info_id
group by food_id
having count(*)>1)
) as dont_delete
on food.food_id=dont_delete.food_id and
info.info_id<max_info_id
给你:
PRIMARY_KEY FOOD_ID FOOD_TYPE INFO_ID FOOD_NAME MAX_INFO_ID
70001 12345 fruit 70001 banana 70003
70002 12345 fruit 70002 banana 70003
所以你可以...... delete from food where primary_key in (select food.primary_key from that_big_query_up_there)
和delete from info where info_id in (select food.primary_key from that_big_query_up_there)
对于未来的问题,可能会考虑对food
... unique(primary_key,food_id)
或其他东西的唯一约束,但如果它是一对一的,为什么不将它们存储在一起......?
答案 1 :(得分:0)
即使[foodID]没有重复,您也会获得最大值(Primary_Key) 它不会被删除
中的where条件不在delete tableX
where tableX.informationID not in ( select max(Primary_Key)
from [type]
group by [foodID] )
then just do [type] last
delete [type]
where [type].[Primary_Key] not in ( select max(Primary_Key)
from [type]
group by [foodID] )
然后在[foodID]上创建唯一约束