Question

我有一个从具有重复记录的XML文档生成的数据库。我知道如何从主表中删除一条记录，但不知道那些具有外键约束的记录。

我有大量的XML文档，插入它们时没有关心重复项。删除重复项的一种解决方案是删除最低的Primary_Key值（以及所有相关的外键记录）并保持最高值。不过，我不知道该怎么做。

数据库如下所示：

表1：[类型]

+-------------+---------+-----------+
| Primary_Key | Food_ID | Food_Type |
+-------------+---------+-----------+
|   70001     |  12345  |  fruit    |
|   70002     |  12345  |  fruit    |
|   70003     |  12345  |  meat     |
+----^--------+---------+-----------+
     |
     |-----------------|
                       |    
                       | Linked to primary key in the first table
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key |  Information_ID |   Food_Name | Information |  Comments  | 
+-------------+-----------------+-------------+-------------+------------+
|   0001      |     70001       |   banana    |  buy @ toms | delicious! |
|   0002      |     70002       |   banana    |  buy @ mats | so-so      |
|   0003      |     70003       | decade meat |  buy @ sals | disgusting |
+-------------+-----------------+-------------+-------------+------------+

^ 表2：[food_information]

还有其他几个链接表，它们都具有主表（[type]）中匹配主键值的外键值。

我的问题基于哪种解决方案可能是最好的：

如何删除除70003（最高的一个）以外的所有记录？除非 [Food_ID] 出现多次，否则我们无法知道它是否为重复记录。如果它出现多次，我们需要根据Primary_Key和Foreign_Key关系从所有表中删除记录（有10个）。
如何在插入时更新/合并这些SQL记录以避免再次删除多个？

我更喜欢＃1，因为它阻止我重建数据库，它使插入更容易。

谢谢！

Answer 1

类似......

假设：

create table food (
  primary_key int,
  food_id int,
  food_type varchar(20)
  );

insert into food values (70001,12345,'fruit'); 
insert into food values (70002,12345,'fruit'); 
insert into food values (70003,12345,'meat'); 
insert into food values (70004,11111,'taco'); 

create table info (
  primary_key int,
  info_id int,
  food_name varchar(20)
  );

insert into info values (1,70001,'banana'); 
insert into info values (2,70002,'banana'); 
insert into info values (3,70003,'decade meat'); 
insert into info values (4,70004,'taco taco');

然后......

-- yields:   12345   70003 

select  food_id, max(info_id) as max_info_id
from    food 
        join info on food.primary_key=info.info_id
where   food_id in ( 
            select food_id
            from   food 
                   join info on food.primary_key=info.info_id
            group  by food_id
            having count(*)>1);

然后......类似......这样可以删除那些。可能有更好的方法来写这个...我正在考虑它。

select  *
from    food 
        join info on food.primary_key=info.info_id 
        join ( select  food_id, max(info_id) as max_info_id
               from    food 
                       join info on food.primary_key=info.info_id
               where   food_id in ( 
                           select food_id
                           from   food 
                                  join info on food.primary_key=info.info_id
                           group  by food_id
                           having count(*)>1) 
               ) as dont_delete
            on food.food_id=dont_delete.food_id and 
               info.info_id<max_info_id

给你：

PRIMARY_KEY FOOD_ID FOOD_TYPE   INFO_ID FOOD_NAME   MAX_INFO_ID
70001       12345   fruit       70001   banana      70003
70002       12345   fruit       70002   banana      70003

所以你可以...... delete from food where primary_key in (select food.primary_key from that_big_query_up_there)和delete from info where info_id in (select food.primary_key from that_big_query_up_there)

对于未来的问题，可能会考虑对food ... unique(primary_key,food_id)或其他东西的唯一约束，但如果它是一对一的，为什么不将它们存储在一起......？

Answer 2

即使[foodID]没有重复，您也会获得最大值（Primary_Key）它不会被删除

中的where条件不在

delete tableX 
 where tableX.informationID not in ( select max(Primary_Key) 
                                       from [type] 
                                   group by [foodID] )


then just do [type] last


delete [type] 
 where [type].[Primary_Key] not in ( select max(Primary_Key) 
                                       from [type] 
                                      group by [foodID] )

然后在[foodID]上创建唯一约束

如何删除重复记录，或将它们与外键约束完整地合并？

2 个答案: